bg
News
18:59, 09 May 2026
views
12

Russian Researchers Find AI Models Struggle With Long-Range Reasoning

The AIRI Institute’s artificial intelligence lab has developed a new benchmark showing that even the most advanced language models begin to fail and resort to near-random answers when forced to process large volumes of information. The research was presented at the ICLR 2026 conference in Brazil.

Photo: ferra.ru

Researchers created MMReD, a benchmark designed to test how effectively AI systems reason over long contexts. Unlike conventional evaluations that require models to retrieve a single fact from a large body of text, MMReD forces systems to analyze entire chains of events, compare them, and draw conclusions. The researchers say this type of reasoning is essential for deploying AI in fields such as medicine, law, and finance.

The benchmark simulates an environment in which five characters move between six rooms. Models receive a sequence of observations ranging from one to 128 steps and must answer questions of varying complexity. Researchers tested 12 systems, including OpenAI’s GPT-4o, Alibaba Cloud’s Qwen2.5-VL-72B, and DeepSeek’s DeepSeek-R1. All of them showed sharp declines in performance as the amount of input data increased.

“We observed more than a simple decline in quality on long-context tasks — it was a collapse of reasoning,” said Maksim Kurkin, a researcher at the FusionBrain laboratory at the AIRI Institute. “On some tasks with N=128, even leading reasoning-oriented models dropped to the level of random guessing.”

According to Kurkin, the issue is not tied to any single AI architecture, as all tested models displayed a similar degradation curve. In practice, the systems were able to use only 10% to 20% of the input information effectively. The researchers argue that solving the problem will require deeper architectural changes, including systems built around recurrent memory mechanisms.


like
heart
fun
wow
sad
angry
Latest news
Important
Recommended
previous
next