Data Machina #179

RLHF. AlphaTensor explained. What's wrong with backpropragation? The accuracy of AlphaCode. Transfer learning & domain adaption. Deep probabilistic models. Causal deep learning.

Dec 11, 2022

Reinforcement Learning from Human Feedback (RLHF.) The release of LLM live demos like ChatGPT, and open sourcing of generative models like Stable Diffusion is triggering a lot of new research on AI Alignment and Responsible ML.

RLHF coming to the rescue? The idea is to align Foundation Models/ LLMs with human preferences, instead of imitating [bad] human behaviour. Working together, OpenAI & Deepmind pioneered RLHF in Learning from Human Preferences.

Back in May, Ash an AI researcher @Google published a great post on How RLHF works, its challenges, and future research directions.

A community driven by @CarperAI & @EleutherAI is focusing on improving the performance and safety of LLMs with RLHF. In this cool post, Illustrating Reinforcement Learning from Human Feedback (RLHF) these researchers write about RLHF step-by-step, some oss RLHF tools, and what’s next for RLHF.

Many of the challenges in RLHF stem out from the instability of the RL algos and the huge size of the combinatorial space. In Is Reinforcement Learning (Not) for NLP? a team @AllenAIInstitute et al provide an overview on how to solve these challenges.

If you are terribly bored this Sunday, here are some suggestions:

Youtube Whisperer, generate speech-to-text transcriptions of Youtube videos using OpenAI's Whisper latest v2.0
Story_and_Video_Generation, generate stories and videos from text using GPT-J, Latent Diffusion, and FILM.

Have a nice week.

10 Link-o-Troned

Share Data Machina with friends

A Pythonista Experience

Scripting aRt

Deep & Other Learning Bits

ResearchDocs

El Robótico

data v-i-s-i-o-n-s

DataEng Wranglings

startups -> radar

ML Datasets & Stuff

Postscript, etc

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

Data Machina