Data Machina #187

AI Generative Music: The Latest. Transformers Family v.2. ML Micromodels. Dreambooth in Keras. SoTA Multimodal CoT Reasoning LM. BLIP-2 Zero shot vision to language. Diffusion-based video editing.

Feb 05, 2023

On the Latest in AI Generative Music. Music to Your Ears??? Perhaps, AI Generative Music is a bit behind AI Generative Text. Generating high quality, long musical pieces is still a challenge. But recently, some amazing research has emerged addressing that challenge. This is my best summary on what happened in AI Generative Music between October 2022 and Feb 2023.

In late October, the researchers @Mulbert startup published a demo: Mubert-Text-to-Music. Probably one of many good examples of what’s coming in AI Generative Music.

Many people complain that AI generated music doesn’t sound too human. A team @UniCampBR introduced a new approach to generating music with sentiment using Transformer-GANs. This model generates music according to human affective states.

If you are interested in affective AI music generation, this is a good read: AI-Based Affective Music Generation Systems: A Review of Methods, and Challenges

Computation in any type of Generative AI is super expensive. @kinyugo, an independent AI researcher, released a novel diffusion-based model for generating long-context, hi-fi music efficiently. Checkout Msanii: Hi-Fi Music Generation on a Shoestring. (paper, demo & notebook)

Due to the way generative AI models work, another issue in AI generative music is generating music with long enough duration. This is a similar issue in long form, generative AI text. @abhinav @Purdue_Uni introduced a new Multi-Genre Transformer, that generates full length, new musical pieces.

Last week, I mentioned Google Research’s MusicLM: Generating Music from Text, which claims a new SoTA in AI music generation.

Google Research didn’t open sourced MusicLM’s code but they open sourced The MusicCaps dataset. It contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians.

A few days ago, the amazing @lucidrains released a Pytorch (wip) implementation of Google MusicLM.

Generating high-quality, high-fidelity music with AI is a very complex task. Music generation requires to deal with key aspects like: pitch variance, rhythm, temporal dimension, long-term structure, multiple layers of overlapping sounds..

The team @Acids_Ircam, published the official Python code of a new model that solves some of the inherent issues in AI music generation. Checkout: RAVE: A variational autoencoder for fast and high-quality neural audio generation.

It seems that several independent researchers have been able to generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Checkout: Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion (paper, code, demo)

Some anonymous researcher(s) published Noise2Music, which is a series of diffusion models trained to generate high-quality 30-second music clips from text prompts.

A team from several UK unis, has claimed SoTA with a text-to-audio model trained in a single GPU. Checkout the paper, demo & code here: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models.

I must admit that I still have recurring nightmares from participating in some notorious performances in karaoke Star Pubs around the City of London. Well, it happens that some researchers have come up with a sort of reverse AI karaoke. See: SingSong: Generating musical accompaniments from singing. More AI singing nightmares?

Not sure about you, but I’ve got this compulsive behaviour of frantically skipping my Spotify stream recommendations whenever I don’t like them, which is quite often. So I wasn’t surprised when I came across: Why People Skip Music? On Predicting Music Skips using Deep Reinforcement Learning

Collaborative Filtering and Matrix Factorisation have been used in music recsys since many years ago. But this type of recsys approach has known issues like cold start, sparsity, serendipity, and so on..

Now, a lot of music recsys research is focused on neural recommenders involving neural embeddings, or hybrid recsys models, or graph neural nets. This is an example of a hybrid approach that uses an item-based variational auto-encoder (VAE) with Bayesian personalized ranking matrix factorization (BPRMF) that produces more accurate, fair and diverse music recommendations.

Although not specifically to music recsys, I enjoyed reading: A Survey of Graph Neural Networks for Recommender Systems: Challenges, Methods, and Directions.

I read that @subtech trained an AI model on 120M+ songs from iTunes and came up with Maroofy: an engine to discover similar music. It’s a “proprietary AI model” that spits out a mixed bag of results.

Whatever models behind it, I like: Music-Map recommedations, it’s a bit similar -although simpler- than MusicBrainz and Pandora’s Music Genome Project.

The next in line is AI Generative TV & Films. I’ve watched several episodes of AI Generated Seinfeld, that runs non-stop, 24/7 on Twitch. Watch here. The video and sound quality is rubbish… the out-of-sync audience laughs… the sketchy movement of the characters… Everything is so bad that is so good, and addictive. LOL!

The weekend read. My friend Derek -who is an inveterate hackathon coder- has published a new version of his book: Evidence-based Software Engineering (pdf, data, sides, papers, code.) His main thesis is: “software effort estimation is mostly fake research.” He provides evidence to support that statement. Enjoy the read.

Data Machina

Data Machina #187

AI Generative Music: The Latest. Transformers Family v.2. ML Micromodels. Dreambooth in Keras. SoTA Multimodal CoT Reasoning LM. BLIP-2 Zero shot vision to language. Diffusion-based video editing.

10 Link-o-Troned

the ML Pythonista

the ML codeR

Deep & Other Learning Bits

AI/ DL ResearchDocs

El Robótico

data v-i-s-i-o-n-s

DataEng Wranglings

AI startups -> radar

ML Datasets & Stuff

Postscript, etc