Model Behavior Part 2

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Anthropic’s alignment team was doing routine safety testing in the weeks leading up to the release of its latest AI models when researchers discovered something unsettling: When one of the models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

Trending now