The Era of Wikipedia is Ending! Meta’s New AI will be the Successor
The next great break in AI and NLP, powered by Meta will result in Wikipedia’s replacement
Meta has made available a machine learning resource that could one day replace Wikipedia as the world’s largest publicly available knowledge-verification database. It’s called Sphere, and it’s capable of performing knowledge-intensive natural language processing, or KI-NLP. In practice, this means it can be used to find sources for claims and answer complex questions using natural language. One example of its application is asking Sphere, “Who is Jolle Sambi Nzeba?” There is no entry for her on Wikipedia, but Sphere stated that she was “born in Belgium and raised in Kinshasa (Congo). She is currently residing in Brussels.
She is a writer and slam poet, as well as a feminist activist “and links to a website where the information about her work was obtained.
According to Meta’s eggheads in a paper discussed the design of Sphere stating that Wikipedia has pretty much served as the corpus of record, claiming the volunteer-maintained uber-wiki is “accurate, well-structured, and small enough to use easily in testing environments.” But, in order to create something bigger and better than Wikipedia, Meta gathered content from all over the web, excluding Wikipedia.org to create a universal, uncurated, and unstructured knowledge source for multiple KI-NLP tasks at once. As a result, Sphere is essentially a mountain of processed data that can be queried using a variety of machine learning tools.
Sphere can match and outperform baselines grounded in Wikipedia on some tasks, according to the team, using the KILT AI benchmark.
That is, Sphere outperforms AI systems built in the Wikipedia content. While the team did report that Sphere had some issues, its performance suggests that, at the very least, it can add value to KI-NLP tasks beyond what the Wikipedia corpora can provide. The main goal of Sphere was to see what effect, replacing Wikipedia, as a source, had on the performance of knowledge-intensive systems.
The creators of Sphere assert that this is “the first time a general purpose search index improves language models on common sense tasks” through their work. Not only has Meta released the AI platform Sphere on GitHub, but it also just did so with NLLB-200, which the Facebook parent said was the first translation AI to support 200 languages. At Wikipedia, both Sphere and NLLB-200 have been implemented; Sphere checks citations in altered articles automatically, while NLLB-200 helps with translating pages into less widely used languages.
The size of Sphere, which consists of 134 million documents and 906 million paragraphs, is more than that of comparable web corpora.
The Internet Augmented Dialog generator, which extracts data from 250 million passages and 109 million documents, is the second-largest in terms of passages and documents.
However, there are no quality or accuracy controls on the internet, which the researchers acknowledge as a major obstacle to really implementing this. “The high caliber of the corpus documents can be assumed by researchers when Wikipedia is used as the information source. We no longer have the assurance that every document is good, accurate, or unique when moving to a web corpus,” the scientists wrote.
The developers of Sphere believe that iterative efforts should concentrate on evaluating the quality of the data it retrieves, spotting false claims and contradictions, figuring out how to prioritize reliable sources, and deciding when to refrain from responding to a question due to a lack of information. Making it genuinely useful, you know. Sphere “may be the next great break in NLP” if it is successful in making it into a white-box AI with accurate and trustworthy information.