Stack Overflow, the go-to question-and-answer site for coders and programmers, has temporarily banned users from sharing responses generated by AI chatbot ChatGPT.
The site’s mods said that the ban was temporary and that a final ruling would be made some time in the future after consultation with its community. But, as the mods explained, ChatGPT simply makes it too easy for users to generate responses and flood the site with answers that seem correct at first glance but are often wrong on close examination.
“The primary problem is […] the answers which ChatGPT produces have a high rate of being incorrect.”
“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,” wrote the mods (emphasis theirs). “As such, we need the volume of these posts to reduce […] So, for now, the use of ChatGPT to create posts here on Stack Overflow is not permitted. If a user is believed to have used ChatGPT after this temporary policy is posted, sanctions will be imposed to prevent users from continuing to post such content, even if the posts would otherwise be acceptable.”
ChatGPT is an experimental chatbot created by OpenAI and based on its autocomplete text generator GPT-3.5. A web demo for the bot was released last week and has since been enthusiastically embraced by users around the web. The bot’s interface encourages people to ask questions and in return offers impressive and fluid results across a range of queries; from generating poems, songs, and TV scripts, to answering trivia questions and writing and debugging lines of code.
But while many users have been impressed by ChatGPT’s capabilities, others have noted its persistent tendency to generate plausible but false responses. Ask the bot to write a biography of a public figure, for example, and it may well insert incorrect biographical data with complete confidence. Ask it to explain how to program software for a specific function and it can similarly produce believable but ultimately incorrect code.
AI text models like ChatGPT learn by looking for statistical regularities in text
This is one of several well-known failings of AI text generation models, otherwise known as large language models or LLMs. These systems are trained by analyzing patterns in huge reams of text scraped from the web. They look for statistical regularities in this data and use these to predict what words should come next in any given sentence. This means, though, that they lack hard-coded rules for how certain systems in the world operate, leading to their propensity to generate “fluent bullshit.”
Given the huge scale of these systems, it’s impossible to say with certainty what percentage of their output is false. But in Stack Overflow’s case, the company has judged for now that the risk of misleading users is just too high.
Stack Overflow’s decision is particularly notable as experts in the AI community are currently debating the potential threat posed by these large language models. Yann LeCun, chief AI scientist at Facebook-parent Meta, has argued, for example, that while LLMs can certainly generate bad output like misinformation, they don’t make the actual sharing of this text any easier, which is what causes harm. Others say the potential for these systems to generate text cheaply at a scale necessarily increases the risk that it is later shared.
To date, there’s been little evidence of the harmful effects of LLMs in the real world. But these recent events at Stack Overflow support the argument that the scale of these systems does indeed create new challenges. The site’s mods say as much in announcing the ban on ChatGPT, noting that the “volume of these [AI-generated] answers (thousands) and the fact that the answers often require a detailed read by someone with at least some subject matter expertise in order to determine that the answer is actually bad has effectively swamped our volunteer-based quality curation infrastructure.”
The worry is that this pattern could be repeated on other platforms, with a flood of AI content drowning out the voices of real users with plausible but incorrect data. Exactly how this could play out in different domains around the web, though, would depend on the exact nature of the platform and its moderation capabilities. Whether or not these problems can be mitigated in the future using tools like improved spam filters remains to be seen.
“The scary part was just how confidently incorrect it was.”
Meanwhile, responses to Stack Overflow’s policy announcement on the site’s own discussion boards and on related forums like Hacker News have been broadly supportive, with users adding the caveat that it may be difficult for Stack Overflow’s mods to identify AI-generated answers in the first place.
Many users have recounted their own experiences using the bot, with one individual on Hacker News saying they found that its answers to queries about coding problems were more often wrong than right. “The scary part was just how confidently incorrect it was,” said the user. “The text looked very good, but there were big errors in there.”
Others turned the question of AI moderation over to ChatGPT itself, asking the bot to generate arguments for and against its ban. In one response the bot came to the exact same conclusion as Stack Overflow’s own mods: “Overall, whether or not to allow AI-generated answers on Stack Overflow is a complex decision that would need to be carefully considered by the community.”