Adaptive and Aligned AI

Closing the gap with biological intelligence

AI has become remarkably powerful, generating text, code, and images at scales unthinkable a decade ago. Yet, as Surya Ganguli has recently argued, biological intelligence still outperforms it by orders of magnitude along several axes[1]. Children learn language from roughly a hundred million words; large models are trained on ten trillion, five orders of magnitude more. Brains run on twenty watts; training a frontier model can consume tens of megawatts. People generalize from a handful of examples and remain reliable under conditions they have never seen; even the largest models break in ways no human would. Closing these gaps is not a matter of scale alone. It will require rethinking how AI systems are designed, drawing on what we know about how biological minds solve the same problems.

My work focuses on building AI systems that are more adaptive Steer2Adapt Adaptation, continuously evolving Thought-Retriever Paper Copilot, and safe Debiasing.

Papers

	Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs Pengrui Han, Xueqiang Xu, Keyang Xuan, Peiyang Song, Siru Ouyang, Runchu Tian, Yuqing Jiang, Cheng Qian, Pengcheng Jiang, Jiashuo Sun, Junxia Cui, Ming Zhong, Ge Liu, Jiawei Han, and Jiaxuan You ICLR Workshop on Representational Alignment (Re-Align), 2026; Under Review arXiv Most steering methods use a single static direction per task or concept. We propose Steer2Adapt, a lightweight framework that adapts LLMs by composing steering vectors rather than learning new ones from scratch — capturing shared concept dimensions as a reusable subspace and dynamically combining them from just a handful of examples. Across 9 tasks and 3 models, this yields a stable, transparent, and data-efficient inference-time adaptation method (+8.2% on average).
	Adaptation of Agentic AI Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Heng Ji, Yejin Choi, Dawn Song, Jimeng Sun, Jiawei Han ( Equal Contribution) Preprint, 2025 arXiv / repo / media Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks. We unify the rapidly expanding research landscape into a systematic framework that spans both agent adaptations and tool adaptations.
	Thought-Retriever: Don’t Just Retrieve Raw Data, Retrieve Thoughts Tao Feng, Pengrui Han*, Guanyu Lin, Ge Liu, Jiaxuan You (* Equal Contribution) Transactions on Machine Learning Research (TMLR), 2026 view / project / paper We introduce Thought-Retriever, a novel model-agnostic algorithm that enables LLMs to effectively utilize external data without being limited by context length.
	Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Guanyu Lin, Tao Feng, Pengrui Han, Ge Liu, Jiaxuan You ( Equal Contribution) System Demonstration Track of EMNLP, 2024 view / arXiv / Huggingface Demo We propose an LLM-based research copilot that automates personalized academic services — finding, recommending, and synthesizing papers — and continuously self-improves through user interactions.
	ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs Pengrui Han, Rafal Kocielnik, Adhithya Saravanan, Roy Jiang, Or Sharir, and Anima Anandkumar (* Equal Contribution) Conference On Language Modeling (COLM), 2024 view / arXiv / code / proceeding We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.
	Exploring Social Bias in Downstream Applications of Text-to-Image Foundation Models Adhithya Saravanan, Rafal Kocielnik, Roy Jiang, Pengrui Han, and Anima Anandkumar NeurIPS Workshop on Failure Modes in the Age of Foundation Models, 2023 · PMLR arXiv We explore the social biases in text-to-image diffusion models used in commercial applications like image editing. By analyzing models like Stable Diffusion, we uncover significant biases, emphasizing the need for careful consideration when adopting these technologies for broader use.

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Guanyu Lin*, Tao Feng*, Pengrui Han*, Ge Liu, Jiaxuan You (* Equal Contribution)

System Demonstration Track of EMNLP, 2024

arXiv HuggingFace Demo

Abstract. As scientific research proliferates, researchers face the daunting task of navigating and reading vast amounts of literature. Existing solutions, such as document QA, fail to provide personalized and up-to-date information efficiently. We present Paper Copilot, a self-evolving, efficient LLM system designed to assist researchers, based on thought-retrieval, user profile and high-performance optimization. Specifically, Paper Copilot can offer personalized research services, maintaining a real-time updated database. Quantitative evaluation demonstrates that Paper Copilot saves 69.92% of time after efficient deployment. This paper details the design and implementation of Paper Copilot, highlighting its contributions to personalized academic support and its potential to streamline the research process.

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Pengrui Han*, Rafal Kocielnik*, Adhithya Saravanan, Roy Jiang, Or Sharir, and Anima Anandkumar (* Equal Contribution)

Conference On Language Modeling (COLM), 2024

arXiv Code Proceeding

Abstract. We propose a light and efficient pipeline that enables both domain and non-domain experts to quickly generate synthetic debiasing data to mitigate specific or general bias in their models with parameter-efficient fine-tuning.