Data Machina

Share this post

Data Machina #183

datamachina.substack.com

Data Machina #183

AI Pair-programming. Code prompting. Neuroscience + neural nets. Graph transformers. Neural search. Neural collaborative filtering. Diffusion GANs. Google Muse. MS Research VALL_E.

Carlos
Jan 8, 2023
3
1
Share
Share this post

Data Machina #183

datamachina.substack.com

On AI Pair-Programming, Code Prompting & LMs. I’ve been doing research for a company on all this stuff. The more I learn, the more I believe AI is going to massively disrupt software engineering, very soon. Sooner than you may think.

Copilot has dramatically accelerated my coding, it's hard to imagine going back to "manual coding". Still learning to use it but it already writes ~80% of my code, ~80% accuracy. I don't even really code, I prompt. & edit. Andrej Karpathy

I met Andrej when he was at Stanford doing research on RNNs, and invited him to give a talk at Data Science London. IMO he is one of the top AI engineers & researchers in the world.

So: The leading AI/ ML Engineers are now using ChatGPT for code discovery & prototyping, and CoPilot for code generation, or both. Checkout: Code Generation: Comparing ChatGPT and CoPilot

Understanding how CoPilot and Codex (the LM behind) interpret prompts is crucial. @ParthThakkar -who is working on ML Program Synthesis- has reverse-engineered CoPilot. In CoPilot Internals, he explains the secret sauce of CoPilot prompting.

@ColinFortuner in his post: AI Coding Checkpoints hypothesises about a really good prompt that could get ChatGPT to produce a perfect set of code and commit it to GitHub automatically.

Back in October, a team @INRIA investigated the importance of prompt temperature and prompt engineering variation for obtaining coding results with 70-99% accuracy. See: Piloting CoPilot & Codex: Hot Temperature, Cold Prompts, or Black Magic?

Indeed knowing how to write good coding prompts is key. The leading AI/ ML engineers are using these prompt libraries below in combination with CoPilot/Codex & ChatGPT to obtain better coding results from prompting:

  • OpenPrompt - a PyTorch library for prompt-learning, a new paradigm

  • PromptSource - a Py toolkit for creating, sharing and using prompts

  • betterprompt - a library for testing LM prompts

If you want to learn more about Prompt Engineering, this is a really great, comprehensive overview: A Complete Introduction to Prompt Engineering for Large Language Models.

Soon, Open AI will add to ChatGPT what they call Prompt Palettes, which are pre-written prompts for all sort of tasks. CoPilot Labs has already added a feature called Brushes that allows you to change code to make it cleaner, more robust and document it automatically.

Want some ideas on using CoPilot/Codex and ChatGPT for coding? Here are a few:

  • How to Prompt Open AI Codex to Produce the Code you Want

  • 11 Ways you Can Use ChatGPT to Write Code

  • Using CoPilot for ML Engineering

  • Introducing Infinite AI Array to ChatGPT for solving typical Python issues on lists, dictionaries and types. So brilliant!

  • Code explanation, code translation and custom prompts with CoPilot Labs

I know many AI/ ML engineers who are very negative, dismissive about AI Pair-Programming and Code Prompting… “It’s just a fad. Plus the AI makes errors…”

In Exploring the Verifiability of Code Generated by GitHub Copilot, the researchers conclude that CoPilot generates verifiable code in low & medium coding tasks.

But the point is not about the AI making errors. Sure! CoPilot makes errors; human coders make errors too. And of course, you need to be a good coder to use AI Pair-programming & Code Prompting.

The point is about AI coding superpowers, reducing coding time drastically…automating testing, documentation, bug fixing, refactoring… Becoming a code prompter and editor supported by AI, instead of just being a code writer aided by Google Search & StackOverflow…

I don't even really code, I prompt & edit…

StackOverflow has already banned posting with ChatGPT content. ICML 2023 is prohibiting the use of LLMs like ChatGPT to write academic papers. The NYC Ed. Dept. has banned ChatGPT across all NYC schools too.

I guess all that is futile gatekeeping. The barbarians are arriving. Eventually, AI/ ML engineers will become supervisors & managers of tribes of AI programming bots.

Programming will be obsolete. I believe the conventional idea of "writing a program" is headed for extinction, and indeed, for all but very specialized applications, most software, as we know it, will be replaced by AI systems that are trained rather than programmed. The End of Programming, Communications of the ACM Jan 2023

How long until then? I reckon much sooner than we expect. What do you think? I’m really iterested in your inputs and comments.

Leave a comment

Have a nice week.


10 Link-o-Troned

  1. [awesome] Some Remarks on Language Models

  2. Neuroscience + AI: neuroAI Comes of Age

  3. A Review: Top, Interesting Language AI Papers, 2022

  4. The Expanding Dark Forest and Generative AI

  5. Intro to Graph ML & Graph Transformers

  6. Open Assistant: An Open Source Chat Language Model

  7. AI for Game Dev: Creating a Farming Game in 5 Days

  8. web-transformers: Run Transformers in the Browser

  9. A Guide to Building an Open Source Neural Search Engine

  10. [Free] Harvard AI Research Experiences – The Course Book

  11. All Things AI- A Complete Resource of AI Tools & Apps, 2023


Share Data Machina with friends


A Pythonista *Experience*

  1. Tutorial: Object Detection with Facebook DETR Transformer

  2. How-to: Build a Neural Collab-Filtering RecSys in Tensorflow

  3. NeuralFit - An Easier Way to Complex Neuro Evolutionary Nets

Scripting aRt

  1. gpttools - Easily Incorporate ChatGPT in your ML Workflow

  2. MLOps: The Whole Game

  3. Causal Discovery in Heavy-tailed Models

Deep & Other Learning Bits

  1. Graph Neural Nets: A Study Guide

  2. Intro to RL with Human Feedback (RLHF): From Zero to chatGPT

  3. A Survey of In-Context Learning

ResearchDocs

  1. Google Muse: SoTA Text-to-Image Generation (Jan 2023)

  2. MS Research VALL-E: SoTA Text-to-Speech (Jan 2023)

  3. Diffusion-GAN: Training GANs with Diffusion

El Robótico

  1. Mini Cheetah Clone Quadruped Teardown

  2. Testing Robotic Shoes, Walk 250% Faster

  3. The Most Realistic, Diverse Autonomous Driving Simulator

data v-i-s-i-o-n-s

  1. ManimML: Animations & Viz of Common ML Concepts

  2. The List of 2022 Best Dataviz Lists

  3. From Data to Viz: A Decision Tree

DataEng Wranglings

  1. Stop Using Airflow for Data Science

  2. Clickhouse Local - A Small Serverless SQL Tool for Data Engineers

  3. A Data Catalog for Data Engineers Who Hate Data Catalogs

AI startups -> radar

  1. Detangle - AI for Legal Docs Summarisation

  2. Precision Neuro - Next Generation Neural Engineering

  3. ExynAI - Autonomous Drones for Dangerous Places

ML Datasets & Stuff

  1. Argoverse 2 - Next Generation Dataset for Self-Driving

  2. SODA - 1st Public, Large-scale Dataset on Social Interactions

  3. MindBigData - THE MNIST Dataset of Brain Signals

Postscript, etc

Thanks for reading Data Machina! Subscribe for free to receive new posts every week.

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

3
1
Share
Share this post

Data Machina #183

datamachina.substack.com
1 Comment
KW NORTON
Writes KW Norton Borders
Jan 8

Love your title and these articles. Timely and super important. Since I am not technically oriented by nature I find this material helpful.

Expand full comment
Reply
Top
New
Community

No posts

Ready for more?

© 2023 Data Machina
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing