Data Machina

Share this post

Data Machina #187

datamachina.substack.com

Data Machina #187

AI Generative Music: The Latest. Transformers Family v.2. ML Micromodels. Dreambooth in Keras. SoTA Multimodal CoT Reasoning LM. BLIP-2 Zero shot vision to language. Diffusion-based video editing.

Carlos
Feb 5, 2023
3
Share
Share this post

Data Machina #187

datamachina.substack.com

On the Latest in AI Generative Music. Music to Your Ears??? Perhaps, AI Generative Music is a bit behind AI Generative Text. Generating high quality, long musical pieces is still a challenge. But recently, some amazing research has emerged addressing that challenge. This is my best summary on what happened in AI Generative Music between October 2022 and Feb 2023.

In late October, the researchers @Mulbert startup published a demo: Mubert-Text-to-Music. Probably one of many good examples of what’s coming in AI Generative Music.

Many people complain that AI generated music doesn’t sound too human. A team @UniCampBR introduced a new approach to generating music with sentiment using Transformer-GANs. This model generates music according to human affective states.

If you are interested in affective AI music generation, this is a good read: AI-Based Affective Music Generation Systems: A Review of Methods, and Challenges

Computation in any type of Generative AI is super expensive. @kinyugo, an independent AI researcher, released a novel diffusion-based model for generating long-context, hi-fi music efficiently. Checkout Msanii: Hi-Fi Music Generation on a Shoestring. (paper, demo & notebook)

Due to the way generative AI models work, another issue in AI generative music is generating music with long enough duration. This is a similar issue in long form, generative AI text. @abhinav @Purdue_Uni introduced a new Multi-Genre Transformer, that generates full length, new musical pieces.

Last week, I mentioned Google Research’s MusicLM: Generating Music from Text, which claims a new SoTA in AI music generation.

Google Research didn’t open sourced MusicLM’s code but they open sourced The MusicCaps dataset. It contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians.

A few days ago, the amazing @lucidrains released a Pytorch (wip) implementation of Google MusicLM.

Generating high-quality, high-fidelity music with AI is a very complex task. Music generation requires to deal with key aspects like: pitch variance, rhythm, temporal dimension, long-term structure, multiple layers of overlapping sounds..

The team @Acids_Ircam, published the official Python code of a new model that solves some of the inherent issues in AI music generation. Checkout: RAVE: A variational autoencoder for fast and high-quality neural audio generation.

It seems that several independent researchers have been able to generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Checkout: Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion (paper, code, demo)

Some anonymous researcher(s) published Noise2Music, which is a series of diffusion models trained to generate high-quality 30-second music clips from text prompts.

A team from several UK unis, has claimed SoTA with a text-to-audio model trained in a single GPU. Checkout the paper, demo & code here: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.

I must admit that I still have recurring nightmares from participating in some notorious performances in karaoke Star Pubs around the City of London. Well, it happens that some researchers have come up with a sort of reverse AI karaoke. See: SingSong: Generating musical accompaniments from singing. More AI singing nightmares?

Not sure about you, but I’ve got this compulsive behaviour of frantically skipping my Spotify stream recommendations whenever I don’t like them, which is quite often. So I wasn’t surprised when I came across: Why People Skip Music? On Predicting Music Skips using Deep Reinforcement Learning

Collaborative Filtering and Matrix Factorisation have been used in music recsys since many years ago. But this type of recsys approach has known issues like cold start, sparsity, serendipity, and so on..

Now, a lot of music recsys research is focused on neural recommenders involving neural embeddings, or hybrid recsys models, or graph neural nets. This is an example of a hybrid approach that uses an item-based variational auto-encoder (VAE) with Bayesian personalized ranking matrix factorization (BPRMF) that produces more accurate, fair and diverse music recommendations.

Although not specifically to music recsys, I enjoyed reading: A Survey of Graph Neural Networks for Recommender Systems: Challenges, Methods, and Directions.

I read that @subtech trained an AI model on 120M+ songs from iTunes and came up with Maroofy: an engine to discover similar music. It’s a “proprietary AI model” that spits out a mixed bag of results.

Whatever models behind it, I like: Music-Map recommedations, it’s a bit similar -although simpler- than MusicBrainz and Pandora’s Music Genome Project.

The next in line is AI Generative TV & Films. I’ve watched several episodes of AI Generated Seinfeld, that runs non-stop, 24/7 on Twitch. Watch here. The video and sound quality is rubbish… the out-of-sync audience laughs… the sketchy movement of the characters… Everything is so bad that is so good, and addictive. LOL!

The weekend read. My friend Derek -who is an inveterate hackathon coder- has published a new version of his book: Evidence-based Software Engineering (pdf, data, sides, papers, code.) His main thesis is: “software effort estimation is mostly fake research.” He provides evidence to support that statement. Enjoy the read.

Have a nice week.

Thanks for reading Data Machina! Subscribe free to receive new posts every week.

10 Link-o-Troned

  1. A Conceptual Guide to Transformers

  2. The Transformers Family, Version 2.0 Jan 2023

  3. A Dive into Vision-Language Models

  4. Google Research: ML Systems for Complex Models

  5. Recent Advances in Efficient & Scalable Graph Neural Nets

  6. Blueprints for Recommender Systems: 10th Anniversary

  7. One ML Framework for Micromodels up to LLMs

  8. Improving Search Rankings with Few-Shot Prompting of LLM

  9. Unleashing ML Innovation with Opensource Ray @Spotify

  10. IBM Neuro-Symbolic AI Workshop, Feb 2023 [Recorded Sessions]


Share Data Machina with your friends

the ML Pythonista

  1. Google Vizier: Open Source Blackbox & Hyperparam Optimisation

  2. Implementing DreamBooth with KerasCV & TensorFlow

  3. Querying NBA Stats with GPT-3, Statmuse & Langchain

the ML codeR

  1. Neptune-r: Model Registry & MLOps for ML Teams

  2. Machine Shop: ML Models & Tools for R

  3. Algorithms & Music: A Love Story

Deep & Other Learning Bits

  1. Standford NLP with Deep Learning (Lectures & Notes, 2023)

  2. [Free book] MIT Understanding Deep Learning (Feb 2023)

  3. A Comprehensive Survey of Continual Learning

AI/ DL ResearchDocs

  1. Multimodal CoT Reasoning: New SoTA in Language Models

  2. Dreamix: Video Diffusion Models are General Video Editors

  3. BLIP-2: Zero-shot Instructed Vision-to-Language Generation

El Robótico

  1. Waymo Research: Behavior Models for Autonomous Driving

  2. Robotic Automated Trailer Unloading @DHL

  3. Learning Universal Policies via Text-Guided Video Generation

data v-i-s-i-o-n-s

  1. The 1st Interactive Map with AI-Detected Fields & Crops

  2. Visualising the Pathway to AGI over 25 Years

  3. [Interactive] Visualising Income in 166 Countries, 33 metrics

DataEng Wranglings

  1. Data Engineering Tricks with the Pythonic Versatile Data Kit (VDK)

  2. Deploy your Own Databricks Feature Store on Azure with Terraform

  3. SQL Should Be the Default Choice for Data Engineering Pipelines

AI startups -> radar

  1. Olive Diagnostics - AI for Urine Analysis

  2. Scenario - AI Generated Game Assets

  3. Flawless - Generative AI for Filmmaking

ML Datasets & Stuff

  1. Th Lichess Open DB - 4 Billion Chess Games

  2. RealTalk Dataset: 692 in-the-wild videos with visual embeddings

  3. Downstream Datasets Make Surprisingly Good Pretraining Corpora

Postscript, etc

Enjoyed this post? Feel free to share it.

Share

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

3
Share
Share this post

Data Machina #187

datamachina.substack.com
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Data Machina
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing