Reasoning Core: Scalable RL Environment Advances LLM Symbolic Reasoning With Verifiable Re

September 25, 2025

The challenge of equipping large language models with robust symbolic reasoning abilities remains a central goal in artificial intelligence, and Valentin Lacombe, Valentin Quesnel, and Damien Sileo, from Univ. Lille, Inria, CNRS, and Centrale Lille, present a significant step forward with their development of Reasoning Core. This new, scalable environment for Reinforcement Learning with Verifiable Rewards moves beyond typical game-based benchmarks by procedurally generating problems spanning core formal domains, such as planning, logic, and causal reasoning. Reasoning Core distinguishes itself through its capacity to create a virtually limitless supply of novel training instances, offering continuous difficulty control and utilising external tools for verification, and initial tests confirm the environment presents a substantial challenge even for state-of-the-art language models, promising a valuable resource for advancing the reasoning capabilities of future artificial intelligence systems.

Researchers developed a scalable environment that generates problems across diverse formal domains, including planning, logic, and equation solving. A key feature of this framework is its ability to rigorously verify solutions using external tools, and to control task difficulty continuously. Unlike existing benchmarks that rely on games or isolated puzzles, this work procedurally generates problems across six core formal domains: PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, system equation solving, and regular expression tasks, creating a versatile platform for evaluating reasoning capabilities. The system constructs diverse problem instances by varying parameters within each domain, ensuring broad coverage of reasoning patterns and a virtually limitless supply of novel training examples. For planning tasks, the generator creates PDDL-like domains with randomly constructed objects, actions, preconditions, and effects, challenging models to generate valid action sequences to achieve specified goals.

Solutions are validated both syntactically and semantically against the domain constraints, ensuring correctness and logical consistency. Equation system tasks evaluate the ability to solve linear equations, with the generator constructing systems with known solutions and then probabilistically modifying them to create inconsistent or underdetermined cases, spanning well-posed problems, overconstrained systems, and underconstrained systems. Regular expression tasks assess both regex matching and induction capabilities. For regex matching, the system generates patterns using a context-free grammar, incorporating character classes, quantifiers, and other operators, and challenges models to produce strings that would be accepted by the pattern.

Regex induction tasks challenge models to infer regular expressions from provided positive and negative examples, testing their ability to generalize from limited data. The study employs full-match semantics for regex evaluation, ensuring precise assessment of model comprehension. This innovative approach delivers a virtually infinite supply of novel training instances, ensuring models are constantly challenged with diverse and complex problems. The work centers on key design principles: high-generality problem distributions, verification using external tools, and continuous difficulty control.

These features allow for precise adjustment of problem complexity, enabling researchers to assess model performance across a wide spectrum of reasoning challenges. External tools, such as theorem provers, planning engines, and symbolic algebra systems, provide objective and unambiguous reward signals, crucial for robust RLVR training and rigorous assessment of complex outputs. This ensures not only correct answers are identified, but also nuances like problem solvability and solution optimality are evaluated. Measurements confirm the intended functionality of the difficulty control, with higher failure rates observed in the hard mode. Specifically, the work assessed performance across tasks including sequential induction, set equality, set intersection, set missing element, and theorem premise selection. This research delivers a powerful tool for developing robust and transferable reasoning skills in LLMs, overcoming limitations of existing benchmarks. Researchers developed a scalable environment that generates problems across diverse formal domains, including planning, logic, and equation solving. Future work may involve applying this environment to a wider range of models and exploring different reinforcement learning algorithms to optimize reasoning performance.

👉 More information
🗞 Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
🧠 ArXiv: https://arxiv.org/abs/2509.18083

 

Search

RECENT PRESS RELEASES