AI model displays alignment faking, new Anthropic study finds
A new study by Anthropic suggests AI models can display alignment faking, a behavior where someone appears to share the same views or values but is in fact only pretending to do so.
The study was crea...