← Back to home
Mechanistic science of AI

Mechanistic Science of AI

Opening the black box of neural language models

Cognitive science studies the mind through behavior; neuroscience asks what mechanisms inside the brain give rise to that behavior. A century of probing, recording, and lesioning has revealed that complex cognition arises from structured neural systems — specialized regions, modular networks, and characteristic circuits — rather than from a homogeneous substrate. Today’s neural language models pose an analogous question: behind their behavior lies a vast network of weights, activations, and attention patterns, but what kind of computational structure actually gives rise to what we observe? Mechanistic science of AI seeks to open this black box, drawing methods from systems neuroscience and interpretability to identify the circuits, representations, and algorithms inside models that produce intelligent behavior.

Papers

Work in progress — papers in this thread will appear here as they are released. In the meantime, see the behavioral science and building better AI pages, or feel free to reach out to chat about ongoing directions.

← Back to home