On Multimodal Machine Learning (MMML). There is a big convergence happening in language, vision, and in general pre-trained large AI models.
Multimodal ML is emerging as a discipline for building general-purpose, universal models across different modalities. An important area of MMML deals with large-scale, self-supervised, pre-trained models (foundation models) that can generalise with little or no fine-tuning.
Last week, I read : Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. It’s a great overview.
Afaik this seminal paper on Pre-trained Transformers as Universal Computation Engines -published by a team from UC Berkeley, Facebook AI & Google Brain- opened the gates for Multimodal ML and Foundation Models.
Last Thursday, the team @MSAGI (Microsoft Artificial General Intelligence) published Foundation Transformers, a true general-purpose model that can be used across all modalities (language, vision, speech) with guaranteed training stability.
And two days ago, the team @GoogleAIReseach published UL2 20B: An Open Source Unified Language Learner that improves the performance of language models universally across datasets and setups.
The team @CSCarnegieMellonUni has published some great free tutorials and courses on Multimodal ML, check these out:
Have a nice week.
10 Link-o-Troned
A Pythonista *Experience*
Scripting aRt
Deep & Other Learning Bits
ResearchDocs
Algorithmic Potpourri
El Robótico
data v-i-s-i-o-n-s
DataEng Wranglings
startups -> radar
ML Datasets & Stuff
Postscript, etc
Thanks for reading Data Machina! Subscribe for free to receive new posts every week
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.