Data Machina #183
AI Pair-programming. Code prompting. Neuroscience + neural nets. Graph transformers. Neural search. Neural collaborative filtering. Diffusion GANs. Google Muse. MS Research VALL_E.
On AI Pair-Programming, Code Prompting & LMs. I’ve been doing research for a company on all this stuff. The more I learn, the more I believe AI is going to massively disrupt software engineering, very soon. Sooner than you may think.
Copilot has dramatically accelerated my coding, it's hard to imagine going back to "manual coding". Still learning to use it but it already writes ~80% of my code, ~80% accuracy. I don't even really code, I prompt. & edit. Andrej Karpathy
I met Andrej when he was at Stanford doing research on RNNs, and invited him to give a talk at Data Science London. IMO he is one of the top AI engineers & researchers in the world.
So: The leading AI/ ML Engineers are now using ChatGPT for code discovery & prototyping, and CoPilot for code generation, or both. Checkout: Code Generation: Comparing ChatGPT and CoPilot
Understanding how CoPilot and Codex (the LM behind) interpret prompts is crucial. @ParthThakkar -who is working on ML Program Synthesis- has reverse-engineered CoPilot. In CoPilot Internals, he explains the secret sauce of CoPilot prompting.
@ColinFortuner in his post: AI Coding Checkpoints hypothesises about a really good prompt that could get ChatGPT to produce a perfect set of code and commit it to GitHub automatically.
Back in October, a team @INRIA investigated the importance of prompt temperature and prompt engineering variation for obtaining coding results with 70-99% accuracy. See: Piloting CoPilot & Codex: Hot Temperature, Cold Prompts, or Black Magic?
Indeed knowing how to write good coding prompts is key. The leading AI/ ML engineers are using these prompt libraries below in combination with CoPilot/Codex & ChatGPT to obtain better coding results from prompting:
OpenPrompt - a PyTorch library for prompt-learning, a new paradigm
PromptSource - a Py toolkit for creating, sharing and using prompts
betterprompt - a library for testing LM prompts
If you want to learn more about Prompt Engineering, this is a really great, comprehensive overview: A Complete Introduction to Prompt Engineering for Large Language Models.
Soon, Open AI will add to ChatGPT what they call Prompt Palettes, which are pre-written prompts for all sort of tasks. CoPilot Labs has already added a feature called Brushes that allows you to change code to make it cleaner, more robust and document it automatically.
Want some ideas on using CoPilot/Codex and ChatGPT for coding? Here are a few:
Introducing Infinite AI Array to ChatGPT for solving typical Python issues on lists, dictionaries and types. So brilliant!
Code explanation, code translation and custom prompts with CoPilot Labs
I know many AI/ ML engineers who are very negative, dismissive about AI Pair-Programming and Code Prompting… “It’s just a fad. Plus the AI makes errors…”
In Exploring the Verifiability of Code Generated by GitHub Copilot, the researchers conclude that CoPilot generates verifiable code in low & medium coding tasks.
But the point is not about the AI making errors. Sure! CoPilot makes errors; human coders make errors too. And of course, you need to be a good coder to use AI Pair-programming & Code Prompting.
The point is about AI coding superpowers, reducing coding time drastically…automating testing, documentation, bug fixing, refactoring… Becoming a code prompter and editor supported by AI, instead of just being a code writer aided by Google Search & StackOverflow…
I don't even really code, I prompt & edit…
StackOverflow has already banned posting with ChatGPT content. ICML 2023 is prohibiting the use of LLMs like ChatGPT to write academic papers. The NYC Ed. Dept. has banned ChatGPT across all NYC schools too.
I guess all that is futile gatekeeping. The barbarians are arriving. Eventually, AI/ ML engineers will become supervisors & managers of tribes of AI programming bots.
Programming will be obsolete. I believe the conventional idea of "writing a program" is headed for extinction, and indeed, for all but very specialized applications, most software, as we know it, will be replaced by AI systems that are trained rather than programmed. The End of Programming, Communications of the ACM Jan 2023
How long until then? I reckon much sooner than we expect. What do you think? I’m really iterested in your inputs and comments.
Have a nice week.
[awesome] Some Remarks on Language Models
A Pythonista *Experience*
How-to: Build a Neural Collab-Filtering RecSys in Tensorflow
NeuralFit - An Easier Way to Complex Neuro Evolutionary Nets
Deep & Other Learning Bits
Intro to RL with Human Feedback (RLHF): From Zero to chatGPT
Clickhouse Local - A Small Serverless SQL Tool for Data Engineers
AI startups -> radar
ML Datasets & Stuff
SODA - 1st Public, Large-scale Dataset on Social Interactions
Thanks for reading Data Machina! Subscribe for free to receive new posts every week.
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.
Love your title and these articles. Timely and super important. Since I am not technically oriented by nature I find this material helpful.