16:23, 10 May 2026

Russian Researchers Build an Offline AI Music Generator That Writes and Performs Songs

The system’s defining feature is complete independence from foreign platforms and the ability to operate entirely offline.

Researchers at Novosibirsk State Technical University, or NGTU, have developed software for generating audio content using neural network models. The system can create instrumental music, full songs with vocals and sound effects, while running without cloud services and on consumer-grade hardware equipped with less than 6 GB of VRAM.

A Full Music Production Pipeline

The platform generates music from text prompts without requiring an internet connection. According to the developers, the software supports the entire audio production workflow, from text input to exporting tracks in widely used formats including WAV, MP3 and FLAC. The system has also been optimized for Russian-language prompts.

“Technically, the processing pipeline works in several stages: a language model builds a semantic ‘framework’ for the composition using a chain-of-thought approach, a diffusion transformer performs acoustic synthesis in latent space and the export module converts the result into target formats. The built-in graphical interface allows users to manage projects, save and load generation presets, preview results and export tracks without relying on third-party editing software,” project developer Artur Khusainov said.

Photo - Russian Researchers Build an Offline AI Music Generator That Writes and Performs Songs

The team says future versions will expand the platform’s capabilities even further. Planned upgrades include support for spatial audio in virtual reality applications, integration with MIDI controllers for live interaction and adaptation for industrial production scenarios. Longer term, the software could be used to generate soundscapes for video games, automate dubbing for films and podcasts or produce audio for advertising.

Competition in the Creative AI Market

Music generation has become one of the fastest-growing segments of generative AI, a market currently dominated by foreign platforms such as Suno and Udio. The Novosibirsk system could become part of Russia’s broader push for technological sovereignty in creative AI tools.

Russian developers have already introduced domestic music-generation systems, including Sber’s SymFormer and SymFormerX, which generate music from text prompts using the CLaMP and SymFormer neural approaches. What distinguishes the NGTU project is its ability to operate inside closed environments, its low VRAM requirements and its simultaneous specialization in three audio categories: music, vocals and sound effects.

If the software evolves from an experimental system into a consumer-ready product, users could gain an accessible tool for composing music, dubbing video and producing game soundtracks without hiring a professional studio. For small businesses, that could significantly reduce audio production costs and lower the overall expense of advertising and media projects.

From Entertainment to Professional Creative Tools

For Russia, building a domestic generative AI stack for creative industries has become strategically important as the country looks to reduce dependence on foreign cloud platforms. By introducing a local, autonomous audio-generation system optimized for the Russian language environment, the NGTU team has contributed to the country’s broader technological sovereignty agenda. Import-independent software of this kind could accelerate the development of domestic creative industries.

Meanwhile, the AI-generated music market itself is likely to evolve beyond entertainment content and toward professional production tools. The next wave is expected to include multitrack generation, MIDI integration, stem export, video synchronization, adaptive game soundtracks and legally compliant work with artists’ voices and licensed music catalogs. That transition will require legal and technical frameworks to be designed in parallel with the technology itself.

The key feature of our solution is complete independence from foreign platforms and the ability to operate inside a closed environment. We fine-tuned the base model using specialized LoRA adapters for three content categories: instrumental music, songs with vocals and sound effects. That allows us to avoid blending characteristics between formats and achieve much more precise alignment with user prompts

Yegor Antonyants

Assistant Professor in the Department of Automated Control Systems at Novosibirsk State Technical University and Project Lead

Culture, sports and media