Tutorial: Uncertainty Quantification and Confidence Calibration in LLMs

Abstract

Large Language Models (LLMs) excel in text generation, reasoning, and decision-making, enabling their adoption in high-stakes domains such as healthcare, law, and transportation. However, their reliability is a major concern, as they often produce plausible but incorrect responses. Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction.

However, traditional UQ methods struggle with LLMs due to computational constraints and decoding inconsistencies. Moreover, LLMs introduce unique uncertainty sources, such as input ambiguity, reasoning path divergence, and decoding stochasticity, that extend beyond classical aleatoric and epistemic uncertainty. To address this, we introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions (input, reasoning, parameter, and prediction uncertainty). We evaluate existing techniques, assess their real-world applicability, and identify open challenges, emphasizing the need for scalable, interpretable, and robust UQ approaches to enhance LLM reliability.

Schedule

Section 1: Introduction and Overview (10 min)
- LLMs and their growth
- Reliability concerns in sensitive applications
- Why we need Uncertainty Quantification and Confidence Calibration
Section 2: Define and understand LLM Uncertainty and Confidence (15 min)
- Traditional UQ methods
- Aleatoric & Epistemic Uncertainty
- Limitations of traditional categories for LLMs
- Uncertainty vs. Confidence
Section 3: UQ Methods in LLMs (90 min)
- Input uncertainty methods
- Reasoning uncertainty methods
- Parameter uncertainty methods
- Prediction uncertainty methods

(Break - 10 min)
Section 4: Confidence (20 min)
- Confidence Estimation and Calibration
- Calibration methods in LLMs
Section 5: Applications (15 min)
- Robotics
- Transportation
- Healthcare
- Education
Section 6: Open Challenges and Future Directions (20 min)
- Efficiency-Performance Trade-offs
- Interpretability Deficits
- Cross-Modality Uncertainty
- Interventions for Uncertainty
- UQ Evaluation
Q&A

Contributors

Xiaoou Liu

Xiaoou Liu is a second-year Ph.D. student in Computer Science at Arizona State University. Her research focuses on trustworthy machine learning, with an emphasis on explainable graph neural networks and uncertainty quantification in large language models. Her work has been published at venues such as KDD and ICCPS.

Tiejin Chen

Tiejin Chen is a third-year Ph.D. student in Computer Science at Arizona State University. His research interests lie in trustworthy AI and its applications, with a focus on LLM reliability and safety. He has contributed to top conferences such as KDD, ACL, and AAAI, and is actively involved in research on LLM reliability and safety. Tiejin has published in top venues including ACL, AAAI and SDM. He also actively contributes as a reviewer for major conferences in machine learning and data mining.

Longchao Da

Longchao is a fourth-year PhD candidate at Arizona State University. His research interests are RL and trustworthy AI. He has publications in top venues such as NeurIPS, ICML, KDD, AAAI, IJCAI, ECML-PKDD, CIKM, CDC, CASE, IJMLC, Machine Learning, and SDM. He successfully hosted in-person hands-on tutorials at the top inter-discipline venue, ITSC 2023 in Spain, with more than 60 participants.

Zhen Lin

Zhen finished his PhD study at the University of Illinois at Urbana-Champaign, and is a researcher at Jump Trading. His research interests are Uncertainty Quantification for Deep Learning and LLM. He has publications in top venues such as NeurIPS, ICLR, ICML, KDD, AAAI, and TMLR. He also co-hosted an in-person tutorial at KDD 2023.

Chacha Chen

Chacha Chen is a PhD candidate in Computer Science at the University of Chicago’s Human+AI Lab, advised by Chenhao Tan. Her research centers on understanding and adapting large language models for high-stakes, knowledge-intensive domains such as healthcare and on designing human-AI workflows that genuinely enhance expert decision-making. She has published papers at KDD, WWW, FAccT, ICLR, AAAI, ML4H and more, and has been recognized with the Best Applied Data Science Paper Award at ECML-PKDD 2020 and a Best Paper award at HMCaT @ ICML 2022.

Hua Wei

Hua Wei is an Assistant Professor in the School of Computing and Augmented Intelligence at Arizona State University. His research focuses on data mining, reinforcement learning, and uncertainty quantification. He has been awarded the Amazon Research Award for LLM uncertainty quantification and multiple Best Paper Awards on top conferences in machine learning, artificial intelligence and data mining. He has also actively organized events related to uncertainty and LLMs, such as the Workshop on Uncertainty Reasoning and Quantification in Decision Making and Agent4IR workshops at CIKM and KDD.

Uncertainty Quantification and Confidence Calibration
in Large Language Models

Abstract

Schedule

Tutorial Materials

Contributors

Xiaoou Liu

Tiejin Chen

Longchao Da

Zhen Lin

Chacha Chen

Hua Wei

Uncertainty Quantification and Confidence Calibrationin Large Language Models

Abstract

Schedule

Tutorial Materials

Contributors

Xiaoou Liu

Tiejin Chen

Longchao Da

Zhen Lin

Chacha Chen

Hua Wei

Uncertainty Quantification and Confidence Calibration
in Large Language Models