A large language model's behavior issues and personality quirks should be tested as any safety risk would, according to OpenAI's postmortem after an update to ChatGPT-4o made the model sycophantic.
In a postmortem, OpenAI detailed what went wrong with the GPT-4o update, why the model was rolled back and what the company is doing to prevent personality issues in the future.
Here's the short version: OpenAI launched the update to GPT-4o and real-world interactions quickly revealed the models was a sycophant. It didn't take long until users noted that GPT-4o suddenly went over-the-top with the flattery, which was obviously insincere (as if a model could be sincere).
The postmortem on the GPT-4o rollout reads like an outage report. OpenAI's take is also instructive as model creators aim to put more personality and emotional intelligence into models.
Going forward, OpenAI said it will treat model behavior issues as launch blocking as it would other safety risks. OpenAI noted that its testing on general model behavior "has been less robust and formalized relative to areas of currently tracked safety risks."
OpenAI also said that it needs to better measure signals on model behavior and acknowledged that it won't predict every issue. The other takeaway from OpenAI is that there is no small launch.
In its blog post on the issue, OpenAI said:
"One of the biggest lessons is fully recognizing how people have started to use ChatGPT for deeply personal advice—something we didn’t see as much even a year ago. At the time, this wasn’t a primary focus, but as AI and society have co-evolved, it’s become clear that we need to treat this use case with great care. It’s now going to be a more meaningful part of our safety work. With so many people depending on a single system for guidance, we have a responsibility to adjust accordingly."
OpenAI said it has been updating GPT-4o to make it more helpful and personable. It's part of an effort to give LLMs more emotional intelligence. It has been clear for the last year that frontier models each have their own personalities. OpenAI botched the reward signals that kept GPT-4o's sycophancy in check. The solution was to roll back the model, a chore that took 24 hours.
The OpenAI post is worth a read because it highlights a few key items or me:
- LLM personalities matter.
- Model behavior is essentially the new UI and the thing that's likely to be sticky over time. After all, you're more likely to stick with a model you actually like personally.
- Qualitative features may be the most important with LLMs. Does the average bear really notice if one model has a math score 1% better than another. Ditto for code or half the other metrics that are benchmarked.
- Emotional intelligence in models (faked just like humans do) is a key feature, but can go horribly wrong.
- LLMs are simply software and the personification of them is likely to lead to more issues like this OpenAI rollback.
- Credit OpenAI for the transparency.