Ruilong Wang | Simone Balloccu

Position: PhD Student

Topics: Multimodal RAG systems for automotive applications

Bio

I am a PhD student working on NLP, LLMs, and knowledge-intensive AI for industrial applications. In collaboration with Volkswagen, I focus on developing AI systems that support production planning and engineering decision-making.

My research investigates how expert knowledge from technical documents, standards, historical project data, and human experience can be incorporated into AI systems to make them more reliable, explainable, and effective in real-world industrial settings. In particular, I explore how LLMs can be combined with structured knowledge representations to build AI assistants that reason over domain knowledge and support experts in complex decision processes. My work also includes attribution-aware question answering, especially methods that improve the grounding and evidential support of LLM-generated responses.

Research Interests

Natural Language Processing
Large Language Models
Retrieval-Augmented Generation
Industrial AI
Expert knowledge modeling
Grounded, evidence-based Question Answering.

Publications

2026

ARQA: A Benchmark for Grounded Table–Text QA in Enterprise Annual Reports

Ruilong "Wang and Simone" Balloccu

In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), Mar 2026

Abs DOI

Annual reports communicate corporate performance to stakeholders through dense tables and explanatory text, with rich grounding signals making automated reasoning challenging. Existing QA benchmarks focus on retrieval or single-modality reasoning and rarely require justification for answers with both textual and tabular evidence. We introduce ARQA (Annual Report QA), a benchmark of ~2.5K QA pairs spanning ten fiscal years of automotive enterprise annual reports and three reasoning families — Lookup, Arithmetic, and Insight. Data are produced via a planner–generator pipeline, deterministically verified and recomputed, and fully reviewed by domain experts. We evaluate state-of-the-art instruction-tuned language models on ARQA, showing strong factual retrieval but persistent weaknesses in grounded arithmetic and causal reasoning. We release ARQA and its evaluation toolkit to facilitate research on auditable, evidence-first reasoning over enterprise documents. (https://github.com/RuilongWang/ARQA-Benchmark/)