Discover more from Data Machina
Data Machina #191
Prompt Engineering: A Random Tour. Model-agnostic ML. The Annotated Transformer. MLSys. SpikeGPT. Directed Diffusion. Google Flan-UL2 20B
On Prompt Engineering & GPT Models: A Random Tour. Many colleagues say that prompting and prompt engineering are a “bug of LLMs” and a “fad that will fizz out as soon LLMs get better,” I disagree. Prompt engineering is much more than querying a bot with some clever keywords. Prompt engineering is here to stay.
To set the scene of this post, I really enjoyed reading On Prompt Engineering as it conveys many of the ideas I have on this topic.
Prompt engineering is evolving fast, and becoming a sophisticated skill. Prompting is a natural programming language too:
In a recent webinar, Chris Potts @StanfordNLP talked about old-school prompting and (cutting edge!) step-by-step prompting. Checkout the slides: Beyond GPT-3: Key concepts and open questions in a golden age for Natural Language Understanding.
Prompt engineering requires: domain expertise, understanding how LLMs react to prompts, structured thinking, a problem solving mindset, and excellent communication skills. Prem’s essay it’s spot on: Prompt Design: Programming with Plain Text, and Prompt Tuning: Learnable Prompts.
A friend tells me that Anthropic - a startup leader in LLMs- is struggling to recruit a Prompt Engineer for a $175k - $335k/year salary, despite receiving hundreds of applications.
So the new ChatGPT API is powered by the new gpt-turbo-3.5 model. Bihan @Scale wrote about why ChatGPT API is much faster, much cheaper, and much wordier than ChatGPT Web UI powered by text-davinci-003 model. Some interesting insights on the quality of outputs generated.
Open AI posted new documentation on several things that have changed with ChatGPT API and gpt-turbo-3.5. For example: instruction prompts, chat completion, fine tuning, and the usage of your data.
You can test how prompting has changed (or not) in your experiments with the new gpt-3.5-turbo model with this ChatGPT Demo in Gradio. It uses the new ChatGPT API and prompt templates from awesome-chatgpt-prompts.
Understanding the relationship between tokens, prompting, and costs is also crucial when running GPT models. Roughly, computing 1 million tokens with the new ChatGPT API will cost you… two dollars… so cheap!
A colleague tells me that it may cost just ~$8,400 to gpt-generate the 4.2 billion words of Wikipedia! Tabarak @OpenAI wrote on: What are tokens and how to count them?
But watch out, because behind the speed and cheap compute of the new ChatGPT API, there is a trade-off. To wit: A lot of engineers, startups that built apps using text-davinci-003 model, have reported issues when migrating/refactoring to gpt-turbo-3.5.
@baobabKoodaa published a notebook on migrating from davinci-003 to gpt-3.5-turbo-0301 with prompt instructions and prompt examples.
In another thread by Han - the creator of PromptPerfect, an awesome tool- he claims that the migration from davinci003 to gpt35-turbo actually made the generated content quality worse in many cases.
Investigating all these issues above, it seems that to achieve faster, cheaper results in the new ChatGPT API, the fingers point to LLM distillation and model compression.
And fingers also point to how AI devs are tweaking the new gpt-3.5-turbo model params like: temperature, frequency & presence penalties, instruction prompting, zero-shot prompts, and so on. Although not tweaked for gpt-3.5 turbo, the Interactive guide to GPT-3 prompt parameters is a great tool.
Prompting, SQL, and data analysis. Arguably, SQL is the most used language for data analysis. If you are a junior to mid-level data analyst specialised in SQL, the corporate grim reaper will try to automate your job sooner than you expect.
Have you tried using GPT/Codex models to execute SQL queries? Here is a nice post on setting up a GPT-SQL playground to making a production LLM prompt for text-to-SQL translation.
Some prompting tools and more. Last week, I wrote about The Modern AI Stack and some prompt engineering tools like: Everyprompt, Promptify, OpenPrompt, Betterprompt, Soaked, Dyno… Here are a few more:
Prompt management made easy with Promptly.
Discover, learn, and test different ChatGPT prompts in Prompt Vibes
PromptGym, play around with prompts for text generation, testing different models and different prompts. And evaluating prompts according to different criteria
CyberSec and prompt injections. As LLM-based apps -like ChatGPT API apps- spread everywhere, I reckon companies will pay big bucks for expertise on prompt engineering against prompt injections.
This is a nice paper on the subject: A Comprehensive Analysis of Novel Prompt Injection Threats to App-Integrated LLMs
Here you can read how to use indirect prompt Injection threats to turn Bing Chat into a data pirate. Also see many examples of jailbreaking prompts here: Jailbreak Chat.
Some recent research on prompt engineering. There is a barrage of research coming from left, right and centre. Here’s my 2 cents shortlist with some interesting papers published in February:
A Prompt Pattern Catalog, a way to enhance prompt engineering with ChatGPT for tasks like software engineering and improve chat output generation.
In-Context [Prompt] Instruction Learning, how to apply in-context learning to prompt instruction learning and improve text-davinci-003 instruction-fine-tuned baseline
Prompting LLMs To Do Science, prompt the LLM with a research goal using 2 large text corpora. Prompt the LLM for an hypotheses. The LLM then proposes a relevant, significant hypotheses. Apply this to evaluate 675 problem tasks in health, business, etc
Active Prompting with CoT for LMs, a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed Chain-of-Thought reasoning)
Reward Design with Language Models, how to simplify reward design by prompting a LLM like GPT-3 as a proxy reward function
EvoPrompting, combining evolutionary neural architecture search (NAS) with prompting to design novel architectures that outperform current SoTA models on algorithmic reasoning tasks
Inevitably, if you are drifting away on a lazy Sunday, I have some prompting pastimes for you:
Prompt the near dystopian future: It’s the year 2030. You need social credits to conduct your daily live. It’s easy: just get a WeChatGPT+ premium subscription
Chat with a legendary startup investor: Paul Graham GPT, an oss app built with gpt-turbo-3.5, Open AI embeddings & vector search with pgvector & Supabase
Test your prompt engineering skills playing Prompt Engineering Chess
Have a nice week.
10 Link-o-Troned
Prior Attempts at Faithful NLP Explanations (pdf, 96 slides)
Is My NLP Model Working? The Answer is Harder Than You Think (slides)
the ML Pythonista
the ML codeR
Deep & Other Learning Bits
AI/ DL ResearchDocs
El Robótico
data v-i-s-i-o-n-s
MLOps Untangled
AI startups -> radar
ML Datasets & Stuff
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.
Subscribe to Data Machina
A weekly deep dive into the latest AI / ML research, projects & repos.