Data Machina #213
New SOTA?? AI. Audiocraft. FlagEmbedding. MetaGPT multi agents. Qwen 7B. UnIVAL multimodal. USearch VSS. WizMap. ToolLlama. Active Sampling. Conformal Predictions for Time-Series.
New SOTA?? AI. Every week or so, tens of AI papers and models emerge claiming SOTA. What happens next is a short-term wave of frenetic benchmarking and arguing. Is this SOTA? Yes, no, maybe… Then the fever fizzles out until the next SOTA arrives. Whatever SOTA, here is a list of really interesting models and projects:
Generative AI audio. Four days ago, Meta AI open-sourced Audiocraft: A SOTA opensource library for Generative Audio. AudioCraft is a single-stop code base for all your generative audio needs: music, sound effects, and compression after training on raw audio signals. The library includes MusicGen and AudioGen, which are based on a single autoregressive LM that operates over streams of compressed discrete music representations.
Audiovisual AI scene detection. In June, Netflix posted about their latest research claiming SOTA on audiovisual scene detection. Movies and TV episodes are not atomic units, but rather composed of smaller elements like: frames, shots, scenes, sequences, and acts. Understanding these elements and how they relate to each other is crucial for tasks such as: video summarisation, highlights detection, content-based video retrieval… Blog post: Detecting Scene Changes in Audiovisual Content
Multimodal AI. Researchers at Salesforce Research recently tweaked up Instructblip-vicuna-13b, which is based on InstructBLIP: An instructed-tuned Vision-LM model. It appears that in the latest multimodal LLM benchmark published a few days ago, Instructblip-vicuna-13b is now leading the pack. Paper: Benchmarking Multimodal LLMs with Generative Comprehension
New instructed Llama v2. A week ago, a rather unknown Korean startup released Llama-2-70b-instruct-v2. The model was finetuned using MS DeepSpeed library and Orca & Vicuña-style datasets. According to Hugging Face Open LLM Leaderboard, Llama 2 70B Instruct v2 is leading across all evaluation scores.
Embedding models. Five days ago, researchers at Beijing AI Academy published a new embedding model FlagEmbedding bge-large-en that -according to the MTEB Leaderboard- outperforms such powerful embedding models like MS e5 and Open AI ada-text-002. I hear from some colleagues, that this is quite a surprise, and probably a refresher.
And just a few days before, Alibaba was claiming SOTA with a family of nifty embedding models - based on BERT framework- for information retrieval, semantic textual similarity, and text reranking. Checkout GTE-Base, GTE-Small & GTE-Large, which also beats Open AI ada-text-002 and MS e5.
Largest opensource geospatial AI foundation model. This week NASA and IBM opensourced a new AI foundation model based on the vision transformer (ViT) and the masked autoencoder. The model was trained on labeled images of floods and burn-scars from wildfires. Researchers claim that the model outperforms the accuracy of current SOTA DL models by 15%. Read more here: IBM and NASA open source the largest geospatial AI foundation model on Hugging Face.
Have a nice week.
10 Link-o-Troned
the ML Pythonista
LLL - The Little LLM List
AI/ DL ResearchDocs
data v-i-s-i-o-n-s
MLOps Untangled
AI startups -> radar
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.