Bio
I am a PhD student working on NLP, LLMs, and knowledge-intensive AI for industrial applications. In collaboration with Volkswagen, I focus on developing AI systems that support production planning and engineering decision-making.
My research investigates how expert knowledge from technical documents, standards, historical project data, and human experience can be incorporated into AI systems to make them more reliable, explainable, and effective in real-world industrial settings. In particular, I explore how LLMs can be combined with structured knowledge representations to build AI assistants that reason over domain knowledge and support experts in complex decision processes. My work also includes attribution-aware question answering, especially methods that improve the grounding and evidential support of LLM-generated responses.
Research Interests
- Natural Language Processing
- Large Language Models
- Retrieval-Augmented Generation
- Industrial AI
- Expert knowledge modeling
- Grounded, evidence-based Question Answering.
Publications
2026
-
ARQA: A Benchmark for Grounded Table–Text QA in Enterprise Annual Reports
Ruilong "Wang and Simone" Balloccu
In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), Mar 2026
Annual reports communicate corporate performance to stakeholders through dense tables and explanatory text, with rich grounding signals making automated reasoning challenging. Existing QA benchmarks focus on retrieval or single-modality reasoning and rarely require justification for answers with both textual and tabular evidence. We introduce ARQA (Annual Report QA), a benchmark of ~2.5K QA pairs spanning ten fiscal years of automotive enterprise annual reports and three reasoning families — Lookup, Arithmetic, and Insight. Data are produced via a planner–generator pipeline, deterministically verified and recomputed, and fully reviewed by domain experts. We evaluate state-of-the-art instruction-tuned language models on ARQA, showing strong factual retrieval but persistent weaknesses in grounded arithmetic and causal reasoning. We release ARQA and its evaluation toolkit to facilitate research on auditable, evidence-first reasoning over enterprise documents. (https://github.com/RuilongWang/ARQA-Benchmark/)