research institute

Anthropic Research

Berkeley, United States

Founded 2022

20+ employees

About

Anthropic Research develops robust evaluation frameworks, most notably ARC Evals, to rigorously assess the safety and reliability of large language models (LLMs). Their key innovation lies in utilizing model-written evaluations – leveraging LLMs themselves to generate challenging test cases and identify potential harmful behaviors in other models. This approach, coupled with interactive data visualization tools, allows Anthropic to systematically explore LLM behaviors and provide insights for improving alignment and reducing risks associated with frontier AI systems, serving researchers and developers focused on responsible AI development.