About The Workshop
The field of AI is entering a new era of interaction, profoundly shaped by the capabilities of Large Language Models (LLMs). While multi-turn interaction has been a long-standing pursuit in AI—from dialogue systems to multi-agent coordination—the advent of LLMs has radically transformed this landscape. These models now engage in complex, long-horizon interactions, process diverse data, and make crucial decisions in dynamic, human-centric scenarios.
This leap forward, however, brings forth critical new research questions and challenges that demand immediate attention:
- Multi-Turn RL Learning for Agentic Tasks Learning from complex, interactive environments like GUI agents and tool-use scenarios, given the challenges of sparse rewards.
- Maintaining Alignment Understanding human values over extended, multi-turn interactions, preventing "loss of alignment" seen in current models.
- Human-AI Interaction Over time, ensuring models adapt to user goals without compromising safety or fairness.
- Long-horizon Evaluation For LLMs' long-term capabilities, consistency, and strategic abilities in complex, multi-turn tasks.
The Workshop on Multi-Turn Interactions in LLMs is designed to be the central forum for addressing these pivotal questions. We invite researchers to contribute to defining the next generation of interactive AI, tackling these core challenges, and charting the course for future advancements in AI reasoning and planning. This workshop will concentrate on key areas where the extended use of LLMs presents both new challenges and opportunities, serving as a platform to discuss and refine methods for future improvements and evaluation for practical LLM use cases.
Topics
Our topics include but are not limited to:
Exploring diverse multi-turn interaction paradigms including human-AI, AI-AI, and AI-environment interactions. We welcome research on new multi-turn tasks, position papers on emerging interaction paradigms, and studies on complex scenarios like web agents, tool usage, simulations, and collaborative multi-agent systems. This includes GUI agents, conversational AI, interactive planning, and other long-horizon interactive tasks.
Novel methods and frameworks for multi-turn interactions, including various reinforcement learning approaches (PPO, GRPO, etc.), agent architectures, training pipelines, and algorithmic innovations. We seek research on addressing sparse rewards, effective credit assignment, improving training stability, rollout efficiency, and developing new RL methods specifically designed for long-horizon interactive settings with LLMs.
Long-horizon evaluation methods that assess consistency, stability, strategic ability, and performance degradation over extended interactions. This includes measuring and predicting performance on complex multi-turn tasks, identifying accumulating errors or unexpected behaviors, and creating comprehensive test environments. We encourage work building upon existing benchmarks like GAIA, TravelPlanner, τ-Bench, and ColBench, as well as developing new evaluation paradigms.
Addressing critical challenges in extended interactions, with a focus on maintaining alignment and safety over long-term interactions. This includes ensuring LLMs remain aligned with human values, maintaining consistent model personas, accurately tracking and adapting to users' changing goals, ensuring personalization does not compromise safety or fairness, and preventing advanced jailbreaking or hidden goal changes. We welcome research on coherence, personalization, trust, and other emerging challenges in multi-turn settings.
Call For Papers
The Workshop on Multi-Turn Interactions in LLMs @ NeurIPS 2025 invites submissions on the development of novel architectures, algorithms, theoretical analyses, empirical studies, and applications in multi-turn interactions with LLMs. Submissions must present original, unpublished research.
Key Dates
- Submission Deadline: September 2, 2025, AoE
- Review Period: September 5 - October 6, 2025
- Notification Date:
September 22, 2025October 8, 2025, AoE - Workshop Date: December 6, 2025
Submission Site
Submissions will be managed via OpenReview. Papers will remain private during the review process. All authors must maintain up-to-date OpenReview profiles to ensure proper conflict-of-interest management and paper matching. Incomplete profiles may result in desk rejection.
Learn how to create an OpenReview profile here.
Submit papers through the NeurIPS 2025 Workshop Submission Portal on OpenReview (Multi-Turn Interactions in LLMs Workshop Submission Portal).
Scope
We welcome contributions across a broad spectrum of topics related to our themes. Accepted papers will be presented as posters, with a subset selected for oral presentations. The workshop will take place in person at NeurIPS 2025, with virtual participation options to be confirmed.Submission Guidelines
Formatting Requirements
Submissions must be in English and follow the NeurIPS 2025 Workshop LaTeX Template.Papers must be submitted as a single PDF file:
- Full Papers: at most 9 pages (main text)
- Short Papers: at most 4 pages (main text)
- References and appendices are not included in the page limit, but the main text must be self-contained. Reviewers are not required to read beyond the main text.
Submissions exceeding the page limit will be desk rejected.
Anonymity
The workshop follows a double-blind review process. Submissions must be anonymized by removing author names, affiliations, and acknowledgments. Prior work should be cited in the third person. Identifying information, including in supplementary materials, must be omitted.Dual Submission and Non-Archival Policy
NeurIPS and ICLR submissions are welcome to submit to our workshop. You can submit your work if it's currently under review at other venues. However, if your paper got accepted by other venues (e.g., NeurIPS) after submission, you have to withdraw from our workshop. Papers that are accepted after the submission deadline will not need to withdraw from our workshop, since the workshop is not archival. (That was a mistake previously, sorry!)
Transparency
By submitting to the workshop, authors agree that for all accepted papers, the original submission, reviews, and meta-reviews will be made publicly available on OpenReview.Contact
Email at multiturn-interactions-organizers@googlegroups.comSpeakers and Panelists
Confirmed Panelists
Schedule
Tentative workshop schedule. All talks include a Q&A session.
| Time (PDT) | Session | Speaker | Talk Title |
|---|---|---|---|
| 08:50 – 09:00 | Opening Remarks | Organizers | |
| 09:00 – 09:30 | Invited Talk 1 | Dawn Song (UC Berkeley) | Challenges in Multi-Turn Safety Alignment |
| 09:30 – 10:00 | Invited Talk 2 | Natasha Jaques (UW & Google DeepMind) | Learning from Human-AI Interaction |
| 10:00 – 10:30 | Oral Presentation 1 | TBA | |
| 10:30 – 11:00 | Invited Talk 3 | Tim Rocktäschel (UCL & Google Deepmind) | Open-Endedness |
| 11:00 – 12:00 | Poster Session 1 | ||
| 12:00 – 13:30 | Lunch Break | ||
| 13:30 – 14:00 | Invited Talk 4 | Diyi Yang (Stanford) | Multi-Agent Learning |
| 14:00 – 14:30 | Invited Talk 5 | Jason Weston (Meta FAIR) | Multi-Turn RL for Agentic Tasks |
| 14:30 – 15:30 | Poster Session 2 | ||
| 15:30 – 16:00 | Oral Presentation 2 | TBA | |
| 16:00 – 16:30 | Invited Talk 6 | Yu Su (Ohio State University) | Planning Capabilities for GUI Agents Task |
| 16:30 – 17:30 | Panel Discussion | Dawn Song, Natasha Jaques, Tim Rocktäschel, Peter Henderson, Diyi Yang, Jason Weston | |
| 17:30 – 17:45 | Paper Award & Closing Remarks | Organizers | |
Accepted Papers
Oral Presentations
-
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
-
RefineBench: Evaluating Refinement Capability in Language Models
Young-jun Lee, Seungone Kim, Byung-kwan Lee, Minkyeong Moon, Yechan Hwang, Jong Myoung Kim, Graham Neubig, Sean Welleck, Ho-jin Choi
-
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, Paul Pu Liang
-
Quantifying Information Gain and Redundancy in Multi-Turn LLM Conversations
Abhiram Rao Gorle, Amit Kumar Singh Yadav, Tsachy Weissman
Spotlight Presentations
Click to expand spotlight papers (10 papers)
-
ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark
Vaskar Nath, Pranav Vishnu Raja, Jane Yu, Claire Yoon, Sean M. Hendryx
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, Bo An
-
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao
-
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu
-
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
Wendong Bu, Yang Wu, Qifan Yu, Minghe Gao, Bingchen Miao, Zhenkui Zhang, Kaihang Pan, Yunfei Li, Mengze Li, Wei Ji, Juncheng Li, Siliang Tang, Yueting Zhuang
-
GEM: A Gym for Agentic LLMs
Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee Sun Lee, Min Lin
-
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
Jingxu Xie, Dylan Xu, Xuandong Zhao, Dawn Song
-
Task Completion Agents are Not Ideal Collaborators
Shannon Zejiang Shen, Valerie Chen, Ken Gu, Alexis Ross, Zixian Ma, Jillian Ross, Alex Gu, Chenglei Si, Wayne Chi, Andi Peng, Jocelyn J Shen, Ameet Talwalkar, Tongshuang Wu, David Sontag
-
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
Zijian Chen, Xueguang Ma, Shengyao Zhuang, Ping Nie, Kai Zou, Sahel Sharifymoghaddam, Andrew Liu, Joshua Green, Kshama Patel, Ruoxi Meng, Mingyi Su, Yanxi Li, Haoran Hong, Xinyu Shi, Xuye Liu, Nandan Thakur, Crystina Zhang, Luyu Gao, Wenhu Chen, Jimmy Lin
-
WOLF: Werewolf-based Observations for LLM Deception and Falsehoods
Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O'brien, Kevin Zhu
Poster Presentations
Click to expand poster papers (109 papers)
-
ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models
Haziq Mohammad Khalid, Athikash Jeyaganthan, Timothy Do, Yicheng Fu, Vasu Sharma, Sean O'brien, Kevin Zhu
-
DeLLMphi: A Multi-Turn Method for Multi-Agent Forecasting
Andrew Robert Williams, Martin Weiss, Victoria Feere, Nasim Rahaman, Hugo Larochelle
-
CEDA: Cross-modal Evaluation through Debate Agents for Robust Hallucination Detection
Susmit Neogi, Wang Yun
-
RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users
Suyu Ye, Haojun Shi, Darren Shih, Hyokun Yun, Tanya G. Roosta, Tianmin Shu
-
Saying the Unsaid: Revealing the Hidden Language of Multimodal Systems Through Telephone Games
Juntu Zhao, Jialing Zhang, Chongxuan Li, Dequan Wang
-
Scalability of LLM-Based Multi-Agent Systems for Scientific Code Generation: A Preliminary Study
Yuru Wang, Kaiyan Zhang, Kai Tian, Sihang Zeng, Xingtai Lv, Ning Ding, Biqing Qi, Bowen Zhou
-
CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Punya Syon Pandey, Yongjin Yang, Jiarui Liu, Zhijing Jin
-
Multi-Turn Human–LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol
Harshvardhan Mestha, Karan Bania, Shreyas V, Sidong Liu, Ashwin Srinivasan
-
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
Ye Liu, Kevin Qinghong Lin, Chang Wen Chen, Mike Zheng Shou
-
User-Assistant Bias in LLMs
Xu Pan, Jingxuan Fan, Zidi Xiong, Ely Hahami, Jorin Overwiening, Ziqian Xie
-
It's LIT! Reliability-Optimized LLMs with Inspectable Tools
Ruixin Zhang, Jon Donnelly, Zhicheng Guo, Ghazal Khalighinejad, Haiyang Huang, Alina Jade Barnett, Cynthia Rudin
-
Stability of Preference Alignment for Multi-Turn Control with LLM Policies
Andrew Silva, Pradyumna Tambwekar, Deepak Edakkattil Gopinath, Jonathan Decastro, Guy Rosman, Avinash Balachandran
-
Let’s Try Again: Eliciting Multi-Turn Reasoning in Language Models via Simplistic Feedback
Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li
-
FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management
Xiang Liu, Hong Chen, Xuming Hu, Xiaowen Chu
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, Chan Hee Song, Jiaman Wu, Shijie Chen, Hanane Nour Moussa, Tianshu Zhang, Jian Xie, Yifei Li, Tianci Xue, Zeyi Liao, Kai Zhang, Boyuan Zheng, Zhaowei Cai, Viktor Rozgic, Morteza Ziyadi, Huan Sun, Yu Su
-
Open-Universe Assistance Games
Rachel Ma, Jingyi Qu, Andreea Bobu, Dylan Hadfield-menell
-
Tracing Coordination Dynamics in Multi-Turn LLM Discussions
Angelina Parfenova, Jürgen Pfeffer, Alexander Denzler
-
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
Nimet Beyza Bozdag, Shuhaib Mehri, Gokhan Tur, Dilek Hakkani-tür
-
Improved Multi-Agent Collaboration with Multi-Turn Reinforcement Learning
Shuo Liu, Tianle Chen, Christopher Amato
-
REFRAG: Rethinking RAG based Decoding
Xiaoqiang Lin, Aritra Ghosh, Bryan Kian Hsiang Low, Anshumali Shrivastava, Vijai Mohan
-
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ -bench
Venkatesh Mishra, Amir Saeidi, Satyam Raj, Mutsumi Nakamura, Jayanth Srinivasa, Gaowen Liu, Ali Payani, Chitta Baral
-
Probe by Gaming: A Game-based Benchmark for Assessing Conceptual Knowledge in LLMs
Shuhang Xu, Weijian Deng, Yixuan Zhou, Fangwei Zhong
-
Another Turn, Better Output? A Turn-Wise Analysis of Iterative LLM Prompting
Shashidhar Reddy Javaji, Bhavul Gauri, Zining Zhu
-
Efficient Reinforcement Learning for Optimizing Multi-turn Student Outcomes with LLM Tutors
Hyunji Nam, Omer Gottesman, Amy Zhang, Dean Foster, Emma Brunskill, Lyle Ungar
-
Collaborative Prediction: Tractable Information Aggregation via Agreement
Natalie Collina, Ira Globus-harris, Surbhi Goel, Varun Gupta, Aaron Roth, Mirah Shi
-
SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning
Aayush Aluru, Myra N. Malik, Samarth Patankar, Spencer Kim, Kevin Zhu, Vasu Sharma, Sean O'brien
-
State-Induced Risk Amplification of AI Agents
Rebecka Nordenlöw, Takayuki Osogami, Lauren Quigley, Sara E. Berger, Rachel K. E. Bellamy
-
MAC: A Multi-Agent Framework for Interactive User Clarification in Multi-turn Conversations
Emre Can Acikgoz, Jinoh Oh, Joo Hyuk Jeon, Jie Hao, Heng Ji, Dilek Hakkani-tür, Gokhan Tur, Xiang Li, Chengyuan Ma, Xing Fan
-
Exploring exploration with foundation agents in interactive environments
Daniel P. Sawyer, Nan Rosemary Ke, Hubert Soyer, Martin Engelcke, John Reid, David P Reichert, Drew Arad Hudson, Alexander Lerchner, Danilo Jimenez Rezende, Timothy P Lillicrap, Michael Curtis Mozer, Jane X Wang
-
PyVision: Agentic Vision with Dynamic Tooling
Shitian Zhao, Haoquan Zhang, Shaoheng Lin, Ming Li, Qilong Wu, Kaipeng Zhang, Chen Wei
-
Characterization and Detection of Incompleteness and Ambiguity in Multi-Turn Interactions with LLMs
Riya Naik, Ashwin Srinivasan, Swati Agarwal, Estrid He
-
Alignment via Competition: Emergent Alignment from Differently Misaligned Agents
Natalie Collina, Surbhi Goel, Aaron Roth, Emily Ryu, Mirah Shi
-
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie, David Qiu, Deepak Gopinath, Dong Lin, Yanchao Sun, Chong Wang, Saloni Potdar, Bhuwan Dhingra
-
The Influence of Scaffolds on Coordination Scaling Laws in LLM Agents
Mariana Meireles, Rupali Bhati, Niklas Lauffer, Cameron Allen
-
CaRT: Teaching LLM Agents to Know When They Know Enough
Grace Liu, Yuxiao Qu, Jeff Schneider, Aarti Singh, Aviral Kumar
-
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu, Spencer Hong, Jingyu Wu, Kushal Chawla, Yuhui Tang, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu
-
MAREval: A Multi-Agent Framework for Evaluating Natural Language Recommendation Explanations
Reza Yousefi Maragheh, Jayesh Uddhav Kudase, Aysenur Inan, Ramin Giahi, Kai Zhao, Jianpeng Xu, Jason Cho, Evren Korpeoglu, Sushant Kumar
-
ObjexMT: Objective Extraction and Metacognitive Calibration for LLM‑as‑a‑Judge under Multi‑Turn Jailbreaks
Hyunjun Kim, Junwoo Ha, Haon Park, Sangyoon Yu
-
Estimating the Empowerment of Language Model Agents
Jinyeop Song, Jeff Gore, Max Kleiman-weiner
-
ExploraTutor: A Dataset for Children’s Exploratory Dialogue by Integrating Multiple Educational theories
Siqi Xie, Yaxin Xu
-
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
Zonghai Yao, Talha Chafekar, Junda Wang, Shuo Han, Feiyun Ouyang, Junhui Qian, Lingxi Li, Hong Yu
-
$\textit{The Traitors}$: Deception and Trust in Multi-Agent Language Model Simulations
Pedro M. P. Curvo
-
LLM Rationalis? Measuring bargaining capabilities of AI negotiators
Cheril Shah, Akshit Agarwal, Kanak Garg, Mourad Heddaya
-
Reinforced Reasoning for Interactive Multi-step Embodied Planning
Di Wu, Jiaxin Fan, Junzhe Zang, Guanbo Wang, Wei Yin, Wenhao Li, Bo Jin
-
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
Alexander Duffy, Samuel J Paech, Ishana Shastri, Elizabeth Karpinski, Baptiste Alloui-cros, Matthew Lyle Olson, Tyler Marques
-
Multi-Agent-as-Judge: Aligning LLM-Agent-Based Automated Evaluation with Multi-Dimensional Human Evaluation
Jiaju Chen, Yuxuan Lu, Xiaojie Wang, Huimin Zeng, Jing Huang, Jiri Gesi, Ying Xu, Dakuo Wang
-
SkyRL-SQL: Multi-turn SQL Data Agents via RL
Shu Liu, Alan Zhu, Sumanth Hegde, Shiyi Cao, Shuo Yuan, Samion Suwito, Tyler Griggs, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica
-
Traxgen: Ground-Truth Trajectory Generation for AI Agent Evaluation
Maria Emilia Mazzolenis, Ruirui Zhang
-
The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets
Shenzhe Zhu, Jiao Sun, Yi Nian, Tobin South, Alex Pentland, Jiaxin Pei
-
Leveraging In-Context Learning for Language Model Agents
Shivanshu Gupta, Sameer Singh, Ashish Sabharwal, Tushar Khot, Ben Bogin
-
A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation
Hong Je-gal, Chanbin Yi, Hyun-suk Lee
-
Learning to be Proactive from Missed User-Signals in Multi-turn Dialogues
Saba Rahimi, Sivapriya Vellaichamy, Kelly Patel, Thomas Cook, Zhen Zeng, Sumitra Ganesh
-
Multi-Turn LLM Systems for Diagnostic Decision-Making: Considerations, Biases, and Challenges
Benjamin Liu, Sejong Kim, Drona Thoka, Varun Puttagunta, Kaylin Sheng, Mark Li, Kiran Nijjer, Adnan Ahmed, Thi Uyen Hanh Le, Sai Chidvilas Gudiboina, Ali Ugur, Kevin Zhu
-
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Deyu Zou, Yongqiang Chen, Jianxiang Wang, Garry Yang, Mufei Li, Qing Da, Pan Li, Yu Gong, James Cheng
-
OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs
Yifu Lu, Shengjie Liu, Li Dong
-
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Yubo Li, Yidi Miao, Xueying Ding, Ramayya Krishnan, Rema Padman
-
Towards Trajectory-Level Alignment: Detecting Intent Drift in Long-Horizon LLM Dialogues
Jianming Lai
-
StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production–Living Simulations with Stardew Valley
Weihao Tan, Changjiu Jiang, Yu Duan, Mingcong Lei, Li Jiageng, Yitian Hong, Xinrun Wang, Bo An
-
AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI
Manik Rana, Calissa Man, Anotida Expected Msiiwa, Jeffrey Paine, Ahan M R
-
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Ruiyi Wang, Prithviraj Ammanabrolu
-
One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
Ritesh Goru, Shanay Mehta, Prateek Jain
-
Verlog: Context-lite Multi-turn Reinforcement Learning framework for Long-Horizon LLM Agents
Wentse Chen, Jiayu Chen, Hao Zhu, Jeff Schneider
-
Disclosure Audits for LLM Agents
Saswat Das, Jameson Sandler, Ferdinando Fioretto
-
Do Large Language Models Defend Their Beliefs Consistently?
Arka Pal, Arthur Liang, Teo Kitanovski, Akilesh Potti, Micah Goldblum
-
SENTINEL: Sentiment Evolution and Narrative Tracking in Extended LLM Interactions
Pranav Anuraag, Ethan Xu, Alexander Arutchev, Asher Nerenberg
-
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
Mohammad Akbar-tajari, Mohammad Taher Pilehvar, Mohammad Mahmoody
-
Optimizing for Persuasion Improves LLM Generalization: Evidence from Quality-Diversity Evolution of Debate Strategies
Aksel Joonas Reedi, Corentin Léger, Julien Pourcel, Loris Gaven, Perrine Charriau, Guillaume Pourcel
-
Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
Melik Ozolcer, Sang Won Bae
-
Goal Alignment in LLM-Based User Simulators for Conversational AI
Shuhaib Mehri
-
Show or Tell? Interactive Task Learning with Large Language Models
Jacob Sansom, Muhammad Khalifa, Honglak Lee, Joyce Chai
-
AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents
Takyoung Kim, Janvijay Singh, Shuhaib Mehri, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-tür
-
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
Adarsh Kumarappan, Ananya Mujoo
-
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation
Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-jun Li, Lydia Chilton, Dakuo Wang
-
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping
-
Stop-RAG: Value-Based Retrieval Control for Iterative RAG
Jaewan Park, Solbee Cho, Jay-yoon Lee
-
Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks
Lukas Beckenbauer, Johannes-lucas Löwe, Ge Zheng, Alexandra Brintrup
-
Reinforcement Learning for Long-Horizon Multi-Turn Search Agents
Vivek Kalyan, Martin Andrews
-
Sotopia-RL: Reward Design for Social Intelligence
Haofei Yu, Zhengyang Qi, Yining Zhao, Kolby Nottingham, Keyang Xuan, Bodhisattwa Prasad Majumder, Hao Zhu, Paul Pu Liang, Jiaxuan You
-
Customer-R1: personalized simulation of Human Behaviors via RL-based LLM Agent in Online Shopping
Ziyi Wang, Yuxuan Lu, Yimeng Zhang, Jing Huang, Dakuo Wang
-
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time
Weizhi Zhang, Xinyang Zhang, Chenwei Zhang, Liangwei Yang, Jingbo Shang, Zhepei Wei, Henry Peng Zou, Zijie Huang, Zhengyang Wang, Yifan Gao, Xiaoman Pan, Lian Xiong, Jingguo Liu, Philip S. Yu, Xian Li
-
Language Models Rate Their Own Actions As Safer
Dipika Khullar, Jack Hopkins, Rowan Wang, Fabien Roger
-
PrefDisco: Evaluating Proactive Personalization through Interactive Preference Discovery
Shuyue Stella Li, Avinandan Bose, Faeze Brahman, Simon Shaolei Du, Pang Wei Koh, Maryam Fazel, Yulia Tsvetkov
-
Conformity, Inertia, and Value Alignment in Multi-Turn LLM Deliberation
Pratik S. Sachdeva, Tom Van Nuenen
-
WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
Yaoyao Qian, Yuanli Wang, Jinda Zhang, Yun Zong, Meixu Chen, Hanhan Zhou, Jindan Huang, Yifan Zeng, Xinyu Hu, Chan Hee Song, Danqing Zhang
-
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks
Sagnik Anupam, Davis Brown, Shuo Li, Eric Wong, Hamed Hassani, Osbert Bastani
-
Benchmarking Correctness and Security in Multi-Turn Code Generation
Ruchit Rawal, Jeffrey Yang Fan Chiang, Jeffery Siyuan Tian, Aastha Mahajan, Tom Goldstein, Yizheng Chen
-
The Chameleon Nature of LLMs: Quantifying Multi-Turn Stance Instability in Search-Enabled Language Models
Shivam Ratnakar, Sanjay Raghavendra
-
Are LLMs Generalist Hanabi Agents?
Mahesh Ramesh, Aswinkumar Ramkumar, Pavan Thodima, Kaousheik Jayakumar, Aniket Rege
-
WEBSERV: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale
Yuxuan Lu, Jing Huang, Hui Liu, Jiri Gesi, Yan Han, Shihan Fu, Tianqi Zheng, Dakuo Wang
-
AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation
Xavier Cadet, Edward Koh, Peter Chin
-
How to Train Your LLM Web Agent: A Statistical Diagnosis
Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier De Chezelles, Nicolas Gontier, Miguel Muñoz-mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia
-
Modeling and Predicting Multi-Turn Answer Instability in Large Language Models
Jiahang He, Rishi Ramachandran, Neel Ramachandran, Aryan Katakam, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Aryan Shrivastava
-
CRMWeaver: Building Powerful Business Agent via Agentic RL and Shared Memories
Yilong Lai, Yipin Yang, Jialong Wu, Zhenglin Wang, Ting Liang, Linjianguo, Keping Yang
-
Toward Community-Driven Agents for Machine Learning Engineering
Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang
-
Semantic Context for Tool Orchestration
Robert Müller
-
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei, Hong Wang, Ying Tiffany He, Fei Yu, Yao Shu
-
TOD-ProcBench: Benchmarking Complex Instruction-Following in Task-Oriented Dialogues
Sarik Ghazarian, Abhinav Gullapalli, Swair Shah, Anurag Beniwal, Nanyun Peng, Narayanan Sadagopan, Zhou Yu
-
Large Language Models Develop Novel Social Biases Through Adaptive Exploration
Addison J. Wu, Ryan Liu, Xuechunzi Bai, Thomas L. Griffiths
-
MELISSA: Multi-level Evaluation with LLM-based Integrated Self-Scrutiny and Auditing
Amirhossein Afsharrad, Sri Jaladi, Nima Yazdani, Ali Ansari, Seyed Shahabeddin Mousavi, Sanjay Lall
-
Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
Prasoon Varshney, Makesh Narsimhan Sreedhar, Liwei Jiang, Traian Rebedea, Christopher Parisien
-
ParetoMIL: Early Risk Detection in Dialogue under Weak Supervision
Avinash Baidya, Xinran Liang, Ruocheng Guo, Kamalika Das, Xiang Gao
-
MultiScale Contextual Bandits for Long Term Objectives
Richa Rastogi, Yuta Saito, Thorsten Joachims
-
ConDABench: Interactive Evaluation of Language Models for Data Analysis
Avik Dutta, Priyanshu Gupta, Hosein Hasanbeig, Rahul Pratap Singh, Harshit Nigam, Sumit Gulwani, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari
-
AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs
María Victoria Carro, Denise Alejandra Mester, Facundo Nieto, Oscar Agustín Stanchi, Guido Ernesto Bergman, Mario Leiva, Luca Nicolás Forziati Gangi, Eitan Sprejer, Francisca Gauna Selasco, Juan Gustavo Corvalan, Maria Vanina Martinez, Gerardo Simari
-
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira
-
Studying Coordination and Collusion in Multi-Agent LLM Code Reviews
Jennifer Za, Aristeidis Panos, Roger Dearnaley, Samuel Albanie
-
Delay-of-Gratification as a Multi-Agent Survival Micro-benchmark for Long-Horizon LLMs: Social Exposure, Personas, and Tool Use Budgets
Olga Manakina, Igor Bogdanov, Chung-horng Lung
-
Improving Language Agents through BREW: Bootstrapping expeRientially-learned Environmental knoWledge
Shashank Kirtania, Param Biyani, Priyanshu Gupta, Yasharth Bajpai, Roshni Iyer, Sumit Gulwani, Gustavo Soares
-
Fathom-Search-4B: Scaling DeepSearch Reasoning Capabilities via RL
Shreyas Singh, Kunal Singh, Pradeep Moturi
Student Registration Grant
The Multi-Turn Interactions in LLMs (MTI-LLM) Workshop at NeurIPS 2025 is pleased to announce limited financial support for student authors of accepted workshop papers.
Thanks to our generous sponsors — Meta AI and Orby AI (now Uniphore) — we are offering registration reimbursement grants to ensure broader student participation in the NeurIPS community.
💡 Eligibility & Coverage
- Only the student registration fee for workshop ($175 early and $230 late) is eligible for reimbursement, we are having 10-15 places.
- Support will be issued via reimbursement after the conference (not as direct payment).
- Travel, accommodation, or other expenses are not covered under this program.
💬 Reimbursement Policy
Reimbursements will be processed after NeurIPS 2025 upon submission of a valid payment receipt for the student registration.
This ensures funds are distributed fairly to participants who attend and present at the workshop.
🗓️ Key Dates
- Application Deadline: October 20, 2025
- Notification of Results: October 27, 2025
🎯 Priority Considerations
Preference will be given to:
- students presenting accepted papers at MTI-LLM @ NeurIPS 2025
- students without other funding support (e.g., from advisor or institution)
- students from underrepresented regions or institutions
Registration Grant Recipients
- Huiqi Zou - Northeastern University - PhD
- Arthur Liang - MIT - Undergraduate
- Adarsh Kumarappan - California Institute of Technology - Undergraduate
- Moises Andrade - Georgia Institute of Technology - Incoming PhD Student
- Mrinal Agarwal - Algoverse - Junior at Emerald High School
- Yaoyao Qian - Northeastern University - Master's
- Junhong Shen - Carnegie Mellon University - PhD
- Xueguang Ma - University of Waterloo - PhD
- Li Li - University of Southern California - PhD
- Ao Qu - Massachusetts Institute of Technology - PhD
- Jingxu Xie - UC Berkeley - PhD
- Young-Jun Lee - KAIST - PhD
- Hao Bai - UIUC - PhD
- Zhiyuan Hu - MIT - PhD
Organizers
This workshop is organized by
Sponsors
Meta
Uniphore / OrbyAI
Partners
Prime Intellect
