OpenAI is forming a new team led by Ilya Sutskever, its chief scientist and one of the company’s co-founders, to develop ways to steer and control “superintelligent” AI systems.
In a blog post published today, Sutskever and Jan Leike, a lead on the alignment team at OpenAI, predict that AI with intelligence exceeding that of humans could arrive within the coming decade. This AI — assuming it does, indeed, eventually arrive — won’t necessarily be benevolent, necessitating research into ways to control it, Sutskever and Leike say.
“Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue,” they write. “Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us.”
To move the needle forward in the area of “superintelligence alignment,” OpenAI is creating a new Superallignment team, led by both Sutskever and Leike, which will have access to 20% of the compute the company has secured to date. Joined by scientists and engineers from OpenAI’s previous alignment team as well as researchers from other orgs across the company, the team will aim to solve the core technical challenges of controlling superintelligent AI over the next four years.
How? By building what Sutskever and Leike describe as a “human-level automated alignment researcher.” The goal is to train AI systems using human feedback, train AI to assist in human evaluation and ultimately build AI that can do alignment research. (Here, “alignment research” refers to ensuring AI systems achieve desired outcomes.)
It’s OpenAI’s hypothesis that AI can make faster and better alignment research progress than humans can.
“As we make progress on this, our AI systems can take over more and more of our alignment work and ultimately conceive, implement, study and develop better alignment techniques than we have now,” Leike and colleagues John Schulman and Jeffrey Wu explain in a previous blog post. “They will work together with humans to ensure that their own successors are more aligned with humans. Human researchers will focus more and more of their effort on reviewing alignment research done by AI systems instead of generating this research by themselves.”
Of course, no method is foolproof — and Leike, Schulman and Wu acknowledge the many limitations of OpenAI’s in their post. Using AI for evaluation has the potential to scale up inconsistencies, biases or vulnerabilities in that AI. And it might turn out that the the hardest parts of the alignment problem might not be related to engineering.
But Sutskever and Leike think it’s worth a go.
“Superintelligence alignment is fundamentally a machine learning problem, and we think great machine learning experts — even if they’re not already working on alignment — will be critical to solving it,” they write. “We plan to share the fruits of this effort broadly and view contributing to alignment and safety of non-OpenAI models as an important part of our work.”