Anthropic explains how language models reason
BLOT: The Anthropic team just gave us the clearest glimpse yet into how Claude organizes its thinking, and it starts with tracing latent directions in embedding space.
I’ve been watching Anthropic’s interpretability research evolve for a while, but this new post stands out. It’s worth reading. They’ve developed a method for trac…
Keep reading with a 7-day free trial
Subscribe to nels.ai to keep reading this post and get 7 days of free access to the full post archives.