|
Research
My research aims to advance scientific understanding of AI (especially neural models like LLMs), and more broadly, the general principles of intelligence and intelligent behavior. I approach this goal across three interconnected levels:
-
Behavioral level — analyzing how models and humans reason, generalize, and solve problems, including studies of
alignment, limitations, and trustworthy reasoning.
-
Mechanistic level — interpreting model internals to understand the
circuits, representations, and algorithms that give rise to intelligent behavior and drive observable performance and failures.
-
Social level — investigating how intelligence emerges and interacts in
multi-agent systems and human–AI collaborations.
One can think of these as three levels, paralleling psychology, neuroscience, and social science in their study of human intelligence and behavior.
If any of this resonates with your interests, feel free to reach out and let's connect/collaborate!
|
Selected Publications
|
|
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Pengrui Han*, Rafal D. Kocielnik*, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, and R. Michael Alvarez (* Equal Contribution)
International Conference on Machine Learning (ICML), 2026
NeurIPS LAW Workshop: Bridging Language, Agent, and World Models, 2025, Best Paper Honorable Mention
arXiv
/
project
/
code
/
media
LLMs say they have personalities, but they don’t act like it. Alignment today shapes language, not behavior. This linguistic–behavioral dissociation cautions against equating coherent self-reports with cognitive depth.
|
|
|
Large Language Model Reasoning Failures
Peiyang Song*, Pengrui Han*, and Noah Goodman (* Equal Contribution)
Transactions on Machine Learning Research (TMLR), 2026, Survey Certificate
arXiv
/
code
/
proceeding
/
media
We present the first comprehensive survey dedicated to reasoning failures in LLMs. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.
|
|
|
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han*, Peiyang Song*, Haofei Yu, and Jiaxuan You (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
arXiv
/
code
/
proceeding
Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.
|
|
|
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han*, Rafal Kocielnik*, Adhithya Saravanan,Roy Jiang, Or Sharir,and Anima Anandkumar (* Equal Contribution)
Conference On Language Modeling (COLM), 2024
arXiv
/
code
/
proceeding
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
|
|
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Guanyu Lin*, Tao Feng*, Pengrui Han*, Ge Liu, Jiaxuan You (* Equal Contribution)
System Demonstration Track of Empirical Methods in Natural Language Processing (EMNLP), 2024
Huggingface Live Demo: Link
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
Selected Awards
- TMLR Survey Certification (2026)
- NeurIPS LAW Workshop Best Paper Honorable Mention Award (2025)
- Phi Beta Kappa Honor Society (2025)
- Carleton College Chang-Lan Award (2024)
- Caltech SURF Award (2023)
- Carleton College Dean's List (2023)
|
Selected Media
- 'Not how you build a digital mind': How reasoning failures are preventing AI models from achieving human-level intelligence, Live Science, 2026
- Scientists Found AI’s Fatal Flaw—The Most Advanced Models Are Failing Basic Logic Tests, Popular Mechanics, 2026
- New Framework Simplifies the Complex Landscape of Agentic AI, VentureBeat, 2025
- This AI Paper Explains Why Most "Agentic AI" Systems Feel Impressive in Demos and then Completely Fall Apart in Real Use, MarkTechPost, 2025
- Researchers Discover "Personality Illusion" to Reveal a Profound Disconnect Between Language and Behavior in LLMs, MIT Technology Review China, 2025
|
Teaching
- CS 411: Database Systems, Teaching Assistant @ UIUC, Spring 2026
- CS 512: Data Mining Principles, Teaching Assistant @ UIUC, Fall 2025
- MATH 241: Ordinary Differential Equations, Teaching Assistant @ Carleton College, Fall 2024
- MATH 321: Real Analysis, Teaching Assistant @ Carleton College, Spring 2024
- MATH 232: Linear Algebra, Teaching Assistant @ Carleton College, Spring 2023
- MATH 232: Linear Algebra, Teaching Assistant @ Carleton College, Winter 2023
|
Academic Services
- Reviewer for conferences: ICLR, ICML, NeurIPS, ACL, COLM, COLING.
- Reviewer for workshops: Re-Align, LLM-Cognition, BehaviorML, LTEDI, INTERPLAY, AI4Math, LatinX, Assessing World Models
|
|