Pengrui Han (Barry)

I am an undergraduate student studying Math and CS at Carleton College (MN), advised by Prof. Anna Rafferty. I am a SURF Research Fellow at Caltech, where I am fortunate to work with Prof. Anima Anandkumar and Dr. Rafał Kocielnik. I am a Researcher in the ULab at UIUC, advised by Prof. Jiaxuan You.

I am currently applying for Ph.D. programs in Computer Science for the 2024–25 cycle.

韩芃睿  /  Email  /  Google Scholar  /  GitHub  /  LinkedIn  /  Twitter

profile photo
News

[Oct. 2024] I am attending COLM 2024 to present our paper on synthetic data for debiasing
[Oct. 2024] Our Paper Copilot is accetped to EMNLP 2024 Demo Track.
[Sep. 2024] Our paper on LLM inhibitory control & A-not-B cognitive errors is accepted to EMNLP 2024 Findings.
[July. 2024] Our paper on synthetic data for debiasing is accepted to COLM 2024.
[Apr. 2024] Our paper on Thought Retriever is accepted to ICLR 2024 Workshop on How Far Are We From AGI.
[Dec. 2023] I am joining UIUC ULab, working on foundation models and intelligent agents

Research

My research interests center on understanding intelligence in both humans and machines. Currently, my focus is on investigating the cognitive limitations of foundation models and developing embodied and socially aware intelligent agents.

Selected Publications
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
Pengrui Han*, Peiyang Song*, Haofei Yu, and Jiaxuan You (* Equal Contribution)
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2024
code

Motivated by the crucial cognitive phenomenon of A-not-B errors, we present the first systematic evaluation on the surprisingly vulnerable inhibitory control abilities of LLMs. We reveal that this weakness undermines LLMs' trustworthy reasoning capabilities across diverse domains, and introduce various mitigations.

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han*, Rafal Kocielnik*, Adhithya Saravanan,Roy Jiang, Or Sharir,and Anima Anandkumar (* Equal Contribution)
Conference On Language Modeling (COLM), 2024
code

We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Guanyu Lin*, Tao Feng*, Pengrui Han*, Ge Liu, Jiaxuan You (* Equal Contribution)
System Demonstration Track of Empirical Methods in Natural Language Processing (EMNLP), 2024
Huggingface Live Demo: Link

We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.

Thought-Retriever: Don’t Just Retrieve Raw Data, Retrieve Thoughts
Tao Feng*, Pengrui Han*, Guanyu Lin*, Ge Liu, Jiaxuan You (* Equal Contribution)
ICLR Workshop on How Far Are We From AGI, 2024

We introduce Thought-Retriever a novel model-agnostic algorithm that enables LLMs to effectively utilize external data without being limited by context length.

Exploring Social Bias in Downstream Applications of Text-to-Image Foundation Models
Adhithya Saravanan, Rafal Kocielnik, Roy Jiang, Pengrui Han, and Anima Anandkumar
NeurIPS Workshop on Failure Modes in the Age of Foundation Models, 2023
Proceedings of Machine Learning Research (PMLR)

We explore the social biases in text-to-image diffusion models used in commercial applications like image editing. By analyzing models like Stable Diffusion, we uncover significant biases, emphasizing the need for careful consideration when adopting these technologies for broader use.

Awards
  • Carleton College Chang-Lan Award (2024)
  • Caltech SURF Award (2023)
  • Carleton College Dean's List (2023)

Site source