Amodei explained: “We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they’re thinking.” “And you find things that are evocative, where there are activations that light up in the models that we see as being associated with the concept of anxiety or something like that.

Discussion