|
Research
My research aims to advance scientific understanding of AI (especially neural models like LLMs), and more broadly, the general principles of intelligence and intelligent behavior. I approach this goal across three interconnected levels:
-
Behavioral level — analyzing how models and humans reason, generalize, and solve problems, including studies of
alignment, limitations, and trustworthy reasoning.
-
Mechanistic level — interpreting model internals to understand the
circuits, representations, and algorithms that give rise to intelligent behavior and drive observable performance and failures.
-
Social level — investigating how intelligence emerges and interacts in
multi-agent systems and human–AI collaborations.
One can think of these as three levels, paralleling psychology, neuroscience, and social science in their study of human intelligence and behavior.
If any of this resonates with your interests, feel free to reach out and let's connect/collaborate!
|
Selected Publications
|
|
The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs
Pengrui Han*, Rafal D. Kocielnik*, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, and R. Michael Alvarez (* Equal Contribution)
ICML Workshop on Models of Human Feedback for AI Alignment (MoFA), 2025
arXiv
/
project
/
code
LLMs say they have personalities, but they don’t act like it. Alignment today shapes language, not behavior. This linguistic–behavioral dissociation cautions against equating coherent self-reports with cognitive depth.
|
|
|
A Survey on Large Language Model Reasoning Failures
Peiyang Song*, Pengrui Han*, and Noah Goodman (* Equal Contribution)
ICML AI for Math Workshop, 2025
preprint
/
full release coming soon
We present the first comprehensive survey dedicated to reasoning failures in LLMs. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.
|
|
|
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han*, Peiyang Song*, Haofei Yu, and Jiaxuan You (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
code
Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.
|
|
|
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han*, Rafal Kocielnik*, Adhithya Saravanan,Roy Jiang, Or Sharir,and Anima Anandkumar (* Equal Contribution)
Conference On Language Modeling (COLM), 2024
code
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
|
|
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Guanyu Lin*, Tao Feng*, Pengrui Han*, Ge Liu, Jiaxuan You (* Equal Contribution)
System Demonstration Track of Empirical Methods in Natural Language Processing (EMNLP), 2024
Huggingface Live Demo: Link
We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
|
|
|
Thought-Retriever: Don’t Just Retrieve Raw Data,
Retrieve Thoughts
Tao Feng*, Pengrui Han*, Guanyu Lin*, Ge Liu, Jiaxuan You (* Equal Contribution)
ICLR Workshop on How Far Are We From AGI, 2024
We introduce Thought-Retriever a novel model-agnostic algorithm that enables LLMs to effectively utilize external data without being limited by context length.
|
|
|
Exploring Social Bias in Downstream Applications of
Text-to-Image Foundation Models
Adhithya Saravanan, Rafal Kocielnik, Roy Jiang, Pengrui Han, and Anima Anandkumar
NeurIPS Workshop on Failure Modes in the Age of Foundation Models, 2023
Proceedings of Machine Learning Research (PMLR)
We explore the social biases in text-to-image diffusion models used in commercial applications like image editing. By analyzing models like Stable Diffusion, we uncover significant biases, emphasizing the need for careful consideration when adopting these technologies for broader use.
|
Awards
- Phi Beta Kappa Honor Society (2025)
- Carleton College Chang-Lan Award (2024)
- Caltech SURF Award (2023)
- Carleton College Dean's List (2023)
|
Academic Services
- Reviewer for conferences: ICLR, ICML, COLM, COLING.
- Reviewer for workshops: Re-Align, LLM-Cognition, BehaviorML, LTEDI, INTERPLAY, AI4Math, LatinX, Assessing World Models
|
|