AI on a Diet: How Russian Mathematicians Are Teaching Neural Networks to Slim Down Without Losing Their Minds
Researchers at Vysshaya shkola ekonomiki (Higher School of Economics, HSE) have shown that size is not everything. Their new method, Procrustes GPT, allows large language models to shrink by up to 36% while retaining as much as 95% of their performance. The advance suggests that the future of artificial intelligence may depend as much on mathematical elegance as on raw scale.

Gigabytes of “Mass”
Large language models demand correction. They are extraordinarily capable, but computationally voracious. Their weights are measured in gigabytes, and running them requires clusters of servers. Researchers at Natsionalny issledovatelsky universitet Vysshaya shkola ekonomiki – Institut IIiTsN (National Research University Higher School of Economics – Institute for AI and Computational Neuroscience) have proposed a method that could fundamentally alter this equation. Procrustes GPT offers a way to put these digital giants on a strict diet while preserving up to 95% of their effectiveness.
“At the core of our work is an elegant mathematical idea – the Procrustes problem,” explains Ekaterina Grishina, a research intern at the Laboratory of Matrix and Tensor Methods in Machine Learning at HSE. “Like the mythological figure who adjusted travelers to fit his bed, the method finds an optimal orthogonal transformation that reshapes a model’s weight matrices into a simpler structure without distorting their essence. That is why we called it Procrustes GPT – and it is the key to compression without significant loss of quality.”
The method sounds almost like mathematical alchemy. Procrustes GPT reduces the size of a large language model by 25–36%. The neural network’s heavy “brain” becomes roughly one-third lighter, yet retains its ability to compose text, solve problems and sustain dialogue. The question is how.

The Procrustes Approach
Conventional slimming techniques such as quantization or pruning – removing “excess” neural connections – typically require painful retraining to restore lost performance. Procrustes GPT operates differently. It is a training-free optimization method that performs structural reduction of weight matrices. Instead of iterative fine-tuning, it applies carefully designed orthogonal transformations that simplify internal structure without re-educating the model.
This is not simply about saving disk space. It directly lowers the computational cost of inference – the stage at which a model generates responses. Fewer operations mean faster response times and lower energy consumption. In practice, that translates into cheaper deployment and broader accessibility.
The primary beneficiaries are not only hyperscale data centers, but also what engineers sometimes call “small hardware.” Internet of Things devices, smartphones, smart speakers and automotive onboard systems operate under strict memory and power constraints. Running a modern LLM locally has often been impractical. Structural compression methods such as Procrustes GPT make this scenario increasingly plausible.
The implications extend beyond convenience. A smartphone capable of processing complex requests locally does not need to transmit sensitive data to the cloud. A home robot can support richer dialogue without relying entirely on remote servers. Reduced computational load also reduces electricity demand. For data centers that consume energy on the scale of small cities, efficiency gains are not just financial – they are environmental. The Russian research aligns squarely with the global push toward more sustainable and democratized AI.

The Race for Lightweight Intelligence
Over the past five years, the global research community has raced to rein in the growing appetite of neural networks. One milestone was SliceGPT, introduced in 2024. Like Procrustes GPT, it targeted post-training structural reduction. SliceGPT demonstrated that entire “slices” of a model could be removed without catastrophic accuracy loss on downstream tasks. It marked an important shift – retraining was no longer viewed as inevitable.
Around the same time, GPTQ gained traction. GPTQ relied on quantization, effectively coarsening model weights to reduce memory requirements. It became a practical standard for many open-source models, enabling them to run on mid-range GPUs.
Since 2024, preprint archives such as arXiv have filled with new proposals. Recent work such as DeltaLLM in 2025 has explored low-rank matrix decompositions as a compression strategy.
Procrustes GPT fits squarely within this training-free optimization movement. Rather than reinventing the field, its authors refine it. Their emphasis on mathematically grounded structural transformations distinguishes the method from purely statistical compression approaches.

A Compact Mind’s Expanding Future
The emergence of Procrustes GPT strengthens Russia’s research profile in AI, but its broader significance lies in market applicability. Startups and corporations working with open-source LLMs could adopt the approach to lower hardware requirements and cut operational costs.
In the coming years, research is likely to intensify at the intersection of structural compression and adaptive expansion. Scientists will seek not only to trim networks, but to reinforce them selectively without increasing overall size. Procrustes GPT already points toward a future in which powerful AI systems no longer reside exclusively in massive cloud infrastructures, but operate efficiently on personal devices.









































