At the IT2025 IEEE Conference in Žabljak, researchers from the University of Donja Gorica presented a study on voice cloning and text-to-speech (TTS) technology for cultural heritage preservation. Their research compared state-of-the-art AI models, including Realtime Voice Cloning (RVC), Tortoise AI, Bark, and Coqui AI, to evaluate how small, high-quality datasets can produce more accurate and natural-sounding speech than large, unstructured ones. The study highlights the potential of AI in preserving the Montenegrin language and oral traditions, enabling the creation of audiobooks, digital archives, and interactive experiences. This research paves the way for more accessible educational resources and enhanced cultural engagement using AI-driven speech synthesis.
ABSTRACT – This research presents a comparative analysis of modern voice cloning systems, focusing on their ability to generate high-quality speech from limited training data. The paper aims to demonstrate that carefully curated smaller datasets can produce superior results to larger, less structured datasets. The investigation of multiple state-of-the-art models, including Realtime Voice Cloning (RVC), Tortoise AI, Bark, and Coqui AI, establishes optimal data preparation protocols and identifies critical factors in training data quality, with particular emphasis on applications for the Montenegrin language and cultural preservation.
