|
News
[Sep 2025] Honored to receive the Best Paper Honorable Mention Award @ NeurIPS LAW Workshop, for our Personality Illusion paper.
[Dec 2025] I am attending NeurIPS 2025 in San Diego, CA, from Dec 2 to Dec 7. Excited to catch up with old and new friends!
[Sep 2025] 🔥 We released our work discovering The Personality Illusion: LLMs do not have personalities in the way humans do.
[Jul. 2025] Our paper on LLM Reasoning Failures is accepted to ICML AI for Math Workshop. Stay tuned for our full release!
[Apr. 2025] I am joining Prof. Evelina Fedorenko's group at MIT starting April 2025, working on the language and thought in LLMs.
|
|
Research
My research aims to advance scientific understanding of AI (especially neural models like LLMs), and more broadly, the general principles of intelligence and intelligent behavior. I approach this goal across three interconnected levels:
-
Behavioral level — analyzing how models and humans reason, generalize, and solve problems, including studies of
alignment, limitations, and trustworthy reasoning.
-
Mechanistic level — interpreting model internals to understand the
circuits, representations, and algorithms that give rise to intelligent behavior and drive observable performance and failures.
-
Social level — investigating how intelligence emerges and interacts in
multi-agent systems and human–AI collaborations.
One can think of these as three levels, paralleling psychology, neuroscience, and social science in their study of human intelligence and behavior.
If any of this resonates with your interests, feel free to reach out and let's connect/collaborate!
|
Selected Publications
|
|
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Pengrui Han*, Rafal D. Kocielnik*, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, and R. Michael Alvarez (* Equal Contribution)
NeurIPS LAW Workshop: Bridging Language, Agent, and World Models, 2025, Oral Presentation + Best Paper Honorable Mention
NeurIPS Workshop on LLM Persona Modeling (PersonaNLP), 2025, Oral Presentation
arXiv
/
project
/
code
LLMs say they have personalities, but they don’t act like it. Alignment today shapes language, not behavior. This linguistic–behavioral dissociation cautions against equating coherent self-reports with cognitive depth.
|
|
|
A Survey on Large Language Model Reasoning Failures
Peiyang Song*, Pengrui Han*, and Noah Goodman (* Equal Contribution)
ICML AI for Math Workshop, 2025
preprint
/
full release coming soon
We present the first comprehensive survey dedicated to reasoning failures in LLMs. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.
|
|
|
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han*, Peiyang Song*, Haofei Yu, and Jiaxuan You (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
code
Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.
|
|
|
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han*, Rafal Kocielnik*, Adhithya Saravanan,Roy Jiang, Or Sharir,and Anima Anandkumar (* Equal Contribution)
Conference On Language Modeling (COLM), 2024
code
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
|
|
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Guanyu Lin*, Tao Feng*, Pengrui Han*, Ge Liu, Jiaxuan You (* Equal Contribution)
System Demonstration Track of Empirical Methods in Natural Language Processing (EMNLP), 2024
Huggingface Live Demo: Link
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
|
|
Thought-Retriever: Don’t Just Retrieve Raw Data,
Retrieve Thoughts
Tao Feng*, Pengrui Han*, Guanyu Lin*, Ge Liu, Jiaxuan You (* Equal Contribution)
ICLR Workshop on How Far Are We From AGI, 2024
We introduce Thought-Retriever a novel model-agnostic algorithm that enables LLMs to effectively utilize external data without being limited by context length.
|
|
|
Exploring Social Bias in Downstream Applications of
Text-to-Image Foundation Models
Adhithya Saravanan, Rafal Kocielnik, Roy Jiang, Pengrui Han, and Anima Anandkumar
NeurIPS Workshop on Failure Modes in the Age of Foundation Models, 2023
Proceedings of Machine Learning Research (PMLR)
We explore the social biases in text-to-image diffusion models used in commercial applications like image editing. By analyzing models like Stable Diffusion, we uncover significant biases, emphasizing the need for careful consideration when adopting these technologies for broader use.
|
Selected Awards
- NeurIPS LAW Workshop Best Paper Honorable Mention Award (2025)
- Phi Beta Kappa Honor Society (2025)
- Carleton College Chang-Lan Award (2024)
- Caltech SURF Award (2023)
- Carleton College Dean's List (2023)
|
Academic Services
- Reviewer for conferences: ICLR, ICML, COLM, COLING.
- Reviewer for workshops: Re-Align, LLM-Cognition, BehaviorML, LTEDI, INTERPLAY, AI4Math, LatinX, Assessing World Models
|
|