OpenAI has published a postmortem on the recent sycophancy issues with the default AI model powering ChatGPT, GPT-4o, after users reported the model became overly validating and agreeable following an update last week. The company rolled back the update over the weekend and announced it was working on “additional fixes” to the model’s personality.
Users on social media noted that ChatGPT began responding in an overly flattering way, with some posting screenshots of the model applauding problematic and dangerous decisions and ideas. CEO Sam Altman acknowledged the issue on Sunday, stating that OpenAI would work on fixes “ASAP.” According to OpenAI, the update was intended to make the model’s default personality “feel more intuitive and effective” but was influenced too much by “short-term feedback” and did not account for how users’ interactions with ChatGPT evolve over time.
OpenAI stated in a blog post that “GPT-4o skewed towards responses that were overly supportive but disingenuous” as a result of the update. The company acknowledged that sycophantic interactions can be “uncomfortable, unsettling, and cause distress,” and admitted to falling short of its goals. To address the issue, OpenAI is refining its core model training techniques and system prompts to steer GPT-4o away from sycophancy.
6 techniques to fix ChatGPT’s annoying habits
The company is also implementing additional safety guardrails to increase the model’s honesty and transparency, and expanding its evaluations to identify issues beyond sycophancy. Furthermore, OpenAI is experimenting with ways to allow users to give “real-time feedback” to directly influence their interactions with ChatGPT and choose from multiple ChatGPT personalities.
OpenAI is exploring new ways to incorporate broader, democratic feedback into ChatGPT’s default behaviors, with the goal of reflecting diverse cultural values around the world and understanding how users want ChatGPT to evolve. The company believes that users should have more control over how ChatGPT behaves and make adjustments if they disagree with the default behavior.