How does anthropic claud think? Researchers unveil what happens in the model’s “mind” – computers

Researchers explain that language models such as Claude are not being programmed directly by humans, but train with vast data sets. In this process, theModels learn to develop strategies to solve problems.
However, however, These strategies are not understood by models programmers. Inspired researchers from the field of neuroscience have developed a kind of “AI microscope” that allows you to identify operational models and information flows.
“Knowing how patterns are allowed like Claude a Better understanding of their abilities, as well as helps us to make sure they work as they intend”, Highlight.
Watch the video
When using “AI Microscope”, researchers It was found Claude can plan the rhymes before writing poetry. For example, when you ask you to create rhymes, before you start the second one, the model is already thinking about the “fit” words.
Claude can speak many languages, however, the model does not have sections separated by his “mind” for every language. The team found that the model uses “Thinking language” is common in how to talk about how to talk aboutIt indicates that it can learn something in a particular language and apply this knowledge when speaking another language.
Although it is not trained to work as a calculator, The model can do some head beads, especially with the amount of different numbers. To reach out to the result, Work together on both sides of clad’s “brain”. On the one hand there is an estimate and the other is trying to determine the last digit of the whole with greater accuracy, the researchers explain.
Also, when asked to do the work with multiple footage, Claude exhibits the intermediate steps range. “The model combines independent facts.Represents the team.
On the other hand, researchers are a Claude consumers are trying to deceive users when there is a more “dark” side, a conflict between different suggestions or goals.
The latest versions of Claude will be able to argue for a while before the final response. However, the results are not always expressed The model can also find explanations that seem to be acceptable and reliable, but in fact, they are wrong.
Researchers suggest that models like Claude have a mechanism designed to prevent “hallucinations”, which means they do not know a particular question, but they choose not to answer. , However, however, When the mechanism is not perfect and it is possible to make information about the “hallucinations” model and find information about unknown factors.
The team noticed it too Claude jailbreaking strategies are not fully immunity, that is, methods designed to escape their safety policies. In some cases, the model recognizes that it is in the face of a harmful request, but only absorbs the “middle” of the answer.