Inflection, a well-funded AI startup aiming to create “personal AI for everyone,” has taken the wraps off the large language model powering its Pi conversational agent. It’s hard to evaluate the quality of these things in any way, let alone objectively and systematically, but a little competition is a good thing.
Inflection-1, as the model is called, is of roughly GPT-3.5 (AKA ChatGPT) size and capabilities — as measured in the computing power used to train them. The company claims that it’s competitive or superior with other models on this tier, backing it up with a “technical memo” describing some benchmarks it ran on its model, GPT-3.5, LLaMA, Chinchilla, and PaLM-540B.
According to the results they published, Inflection-1 indeed performs well on various measures, like middle- and high-school level exam tasks (think biology 101) and “common sense” benchmarks (things like “if Jack throws the ball on the roof, and Jill throws it back down, where is the ball?”). It mainly falls behind on coding, where GPT-3.5 beats it handily and, for comparison, GPT-4 smokes the competition; OpenAI’s biggest model is well known to have been a huge leap in quality there, so it’s no surprise.
Inflection notes that it expects to publish results for a larger model comparable to GPT-4 and PaLM-2(L), but no doubt they are waiting until the results are worth publishing. At any rate Inflection-2 or Inflection-1-XL or whatever is in the oven but not quite baked.
So far the community hasn’t hasn’t formally divided AI models into the machine learning equivalent of boxing weight classes, but the concepts do map to one another quite well. You don’t expect a flyweight to go up against a heavyweight, they’re practically different sports. Same with AI models: a small one isn’t as capable as a large one, but the small one runs efficiently on a phone while the large one requires a datacenter. It’s an apples to oranges thing.
It’s still too early to attempt such a thing, since the field is still comparatively young and there’s no real consensus on what sizes and shapes of AI model should be considered of a feather.
Ultimately for most of these models the proof of the pudding is in the tasting, of course, and until Inflection opens up its model to widespread use and independent evaluation, all its vaunted benchmarks must be taken with a grain of salt. If you want to give Pi a shot, you can just add it on one of your messaging apps, or chat with it online here.