Data Machina #179
RLHF. AlphaTensor explained. What's wrong with backpropragation? The accuracy of AlphaCode. Transfer learning & domain adaption. Deep probabilistic models. Causal deep learning.
Reinforcement Learning from Human Feedback (RLHF.) The release of LLM live demos like ChatGPT, and open sourcing of generative models like Stable Diffusion is triggering a lot of new research on AI Alignment and Responsible ML.
RLHF coming to the rescue? The idea is to align Foundation Models/ LLMs with human preferences, instead of imitating [bad] human behaviour. Working together, OpenAI & Deepmind pioneered RLHF in Learning from Human Preferences.
Back in May, Ash an AI researcher @Google published a great post on How RLHF works, its challenges, and future research directions.
A community driven by @CarperAI & @EleutherAI is focusing on improving the performance and safety of LLMs with RLHF. In this cool post, Illustrating Reinforcement Learning from Human Feedback (RLHF) these researchers write about RLHF step-by-step, some oss RLHF tools, and what’s next for RLHF.
Many of the challenges in RLHF stem out from the instability of the RL algos and the huge size of the combinatorial space. In Is Reinforcement Learning (Not) for NLP? a team @AllenAIInstitute et al provide an overview on how to solve these challenges.
If you are terribly bored this Sunday, here are some suggestions:
Youtube Whisperer, generate speech-to-text transcriptions of Youtube videos using OpenAI's Whisper latest v2.0
Story_and_Video_Generation, generate stories and videos from text using GPT-J, Latent Diffusion, and FILM.
Have a nice week.
10 Link-o-Troned
A Pythonista *Experience*
A Transformer Framework for Non-stationary Time Series Forecasting
DeeProbKit - A Unified Lib for Deep Probabilistic Models (DPM)
Scripting aRt
Deep & Other Learning Bits
ResearchDocs
El Robótico
data v-i-s-i-o-n-s
The Economist - The European Death Toll of the Energy Crisis
[Interactive] Simulate the Impact of an Asteroid Hitting the World
DataEng Wranglings
startups -> radar
ML Datasets & Stuff
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.