Data Machina #207

Steering LLMs. LMQL. Guardrails. Direct Preference Optimization. Sequential Monte Carlo. Fast SAM. AudioPaLM. MAGVIT. New Midjourney 5.2. How RLHF works.

Jun 25, 2023

Taming the Stochastic Beast: Beyond Prompting. I’m in this summer party mingling with smart business & tech people. Everyone is merrily sipping potent cocktails while proclaiming their seemingly vast expertise in ChatGPT prompting. I approach a group of 3 blokes, and just for fun I ask: ”How would you control the outputs from a stochastic parrot?” I get blank faces. Then one of them asks me: “Another Daiquiri perhaps?”

A while ago researchers at Berkeley & Cornell Unis published this paper: Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts Most definitely, I’m going to send the paper to the people that attended the summer party.

LLMs in business. One thing is to do fancy innovation experiments in your company with ChatGPT. Quite another thing, is to consistently get accurate, reliable, trustworthy outputs for your business from an LLM in production, at scale, cost-efficiently.

Beyond prompting: steering. Using some clever or advanced prompting like: in-context learning, CoT, ToT, ReAct, self-consistency… won’t be enough to always getting accurate, reliable outputs consistently. And fine-tuning, reward modelling, and (for the rich) RLHF, may still not deliver what your business needs. For this reason, many investors, researchers and companies are setting there sights in “LLM steering” or a series of new methods and tools to improve LLM consistency, reliability in a time and cost efficient, scalable way.

Detecting problematic prompts. A key aspect of your LLMOps should be monitoring your model’s prompt/response embeddings performance. Here is a new tool to monitor LLM’s outputs and detect problematic prompts.

Start by grounding the LLM. Grounding is the process of using LLMs with information that is use-case specific, relevant, and not available as part of the LLM's trained knowledge. Researchers at MSR suggest that you should first explore the possibilities of Retrieval Augmented Generation before resorting to fine-tuning.

Use a DSL for constraining LLM outputs. Another approach is to use a DSL to specify high-level, logical constraints over the LM output. This allows to enforce many constraints strictly, making it impossible for the model to generate content that does not satisfy the requirements. Checkout: LMQL: A programming language for LLMs

Put some guard-rails around the LLM. Check this new Python package called Guardrails. It lets you add structure, type and quality guarantees to the outputs of LLMs, and also takes corrective actions (e.g. reasking LLM) when validation fails.

Apply DPO: A new policy algorithm. imo this is a very important development.…In late May, Stanford researchers introduced Direct Preference Optimization (DPO): a new stable, performant and computationally efficient policy, that eliminates the need for fitting a reward model, sampling from the LM during fine-tuning, or performing significant hyperparameter tuning. Paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Steer the LLM with Monte Carlo. Researchers at MIT, just released a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. Using SMC you can steer LLMs to solve diverse task with create control of the outputs. Paper, repo: Sequential Monte Carlo Steering of LLM using Probabilistic Programs

Data Machina

Data Machina #207

Steering LLMs. LMQL. Guardrails. Direct Preference Optimization. Sequential Monte Carlo. Fast SAM. AudioPaLM. MAGVIT. New Midjourney 5.2. How RLHF works.

10 Link-o-Troned

the ML Pythonista

Deep & Other Learning Bits

AI/ DL ResearchDocs

data v-i-s-i-o-n-s

MLOps Untangled

AI startups -> radar

ML Datasets & Stuff

Postscript, etc

Discussion about this post