Data Machina #176
LLMs hallucinations, AlphaZero's chess knowledge, visual-text pretaining, Teselin machines for AI, visualising transformers attention, a RNN language model, versatile diffusion,
Large Language Models (LLMs) hallucinating. Meta AI & Papwerswithcode, set out on a very interesting, ambitious mission to organise scientific knowledge with a LLM.
They crunched 48 million papers, text books into a LLM for science. Five days ago they published an online demo called Galactica. “You can use it to write scientific papers & code, summarise academic literature, solve math problems” they wrote.
Soon enough, inevitably, Twitter was flooded with negative examples of Galactica’s hallucinations, showing many falsehoods, racist, wrong, totally inaccurate or invented -albeit authoritative- instances like the Streep-Seinfeld theorem.
A few days ago, Meta IA decided to take down Galactica’s demo . Gary Marcus, a researcher who’s been very critical about the deep implications of LLMs hallucinations, posted a video on the limitations of DL and LLMs, and why we need neurosymbolic AI for robust AI.
LLMs hallucinations is indeed a concern for AI researchers. A team @AI Center, UniofHong-Kong just published a Survey of Hallucination in Natural Language Generation in which they provide a taxonomy of LLMs hallucinations and ideas for remediating this issue.
A frequent issue with LLMs is that they’re pre-trained with human baseline documents, treating all training data as positive instances. In a paper published this week titled The CRINGE Loss, another team @MetaAI has come up with a new contrastive learning approach for training the LLM on what it should not do.
In a practical way, a while back Sandra @CohereAI wrote about 5 Ways to Tackle the Challenges of Large Language Models.
The Stanford Institute for Human-Centered AI is doing intensive research on the capabilities, limitations, and risks of LLMs. Just this week, they published Holistic Evaluation of Language Models in which they define a taxonomy of LLMs scenarios, and a series of metrics and benchmark to better evaluate and understand LLMs.
Finally, I leave you with Linus’ funny rant on LLMs: “Willy-nilly spraying the GPT-3 next-token prediction powder on your tool/product is a recipe for disaster. Text generation is not the product.”
Have a nice week.
10 Link-o-Troned
A Pythonista *Experience*
Scripting aRt
Deep & Other Learning Bits
ResearchDocs
El Robótico
data v-i-s-i-o-n-s
DataEng Wranglings
startups -> radar
ML Datasets & Stuff
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.