Data Machina

Share this post

Data Machina #189

datamachina.substack.com

Data Machina #189

LLM research topics. Common Sense in chatbots. MarioGPT. TikTok's recommender embeddings. Piloting a GPT bot in a law firm. Stanford Prompting, Finetuning & RLHF. Focal modulation vs. Self-Attention.

Carlos
Feb 19, 2023
3
Share
Share this post

Data Machina #189

datamachina.substack.com

Some Research Topics on Language Models. There’s so much stuff happening around LLMs that is challenging to filter what to read. Here’s my two cents on some interesting LLM research in the last 15 days.

Augmenting LLMs. Originally, LLMs were not designed for search, calculation, knowledge retrieval, or symbolic tasks. Inevitably, building LLM apps requires augmentation. This paper gives you an overview on the latest in: Augmented Language Models: a Survey.

On a practical note, LangChain and GPTIndex are excellent tools for augmenting and extending LLMs. Checkout these 2 links below:

  • Build, Query & Visualise a Knowledge Graph with GPTIndex

  • How to build a Q&A bot on documentation with ChatGPT & LangChain

A team @Standford_NLP released the source code of DSP: Demonstrate–Search–Predict Framework, which enables you to build rich interactions between retrieval models (RMs) and language models (LMs.) This is great for developing complex Q&A or conversational search bots.

LLM model development. Developing LLMs is very inefficient. One common issue is that developers have to frequently process the same pieces of text over and over again. If you read How to Build a Chatbot with GPT-3 you can tell how many times the developer has to copy and run the original prompt.

Aspects like prompt templates and prompt chaining can help solving inefficiencies. This week, to address this issue, a team of researchers @Allen_AI introduced a new way to make LLMs development more sustainable with Embedding Recycling.

Triggered by the fact that LLMs involves so many different tasks and layers of computation, a team @MSResearch launched a new initiative called LLMOps for building LLM products. That’s right Large Language Model Operations ;-)

Developing a conversational agent that is able to reason while demonstrating common sense is a bit of an ultimate goal in AI chatbot development. Here’s a good read Common Sense Reasoning for Conversational AI: A Survey of the State of the Art.

Generalist vs. specialised LLMs. This is a classic in ML: generalisation vs specialisation. There are many methods to build and improve generalist LLMs. These researchers have developed AdapterSoup, which uses weight averaging to improve generalisation of pretrained LMs.

Still is not yet truly known how a generalist model can perform many NLP tasks in a zero-shot approach. In this paper: Is ChatGPT a General-Purpose NLP Task Solver? researchers discuss the challenges of LLMs generalisation in depth.

There is a lot of demand for developing specialised LLMs for enterprise verticals. A way to build bespoke LLMs for specialised domains is through sophisticated prompting. In enjoyed reading these 2 papers on specialisation & prompting:

  • SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains

  • À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

LLMs for AI pair-programming & coding. GPT/Codex-based tools are great for code prototyping, and low to mid level pair-programming. But If you want to scale and industrialise code development, How do you evaluate the code generated by the LLM? Enter CodeBERTScore: a new way to evaluate code generation with pretrained models of code

Some developer are fascinated by how LLMs can support programming tasks. David wrote a long post on why ChatGPT Is An Extra-Ordinary Python Programmer.

Efficient training, inference & fine-tuning of LLMs. Fine-tuning LLMs and inference are very computationally expensive. Many researchers are finding new, super-efficient ways to reduce computational costs. Here is a new, SoTA approach that addresses LLM computational challenges: PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware.

On a similar line of research, the Big Little Transformer Decoder is a new framework developed to improve inference efficiency and latency for a wide range of LLM applications.

A team of researchers @CMU & @HPE, developed a new, general cross-modal fine-tuning framework, that achieves SoTA across several LLM tasks

Gaming & LLMs. Since ages, gaming has been one of the main drivers of AI research. This week I came across two really interesting papers on gaming and LLMs:

  • Level Generation Through LLMs

  • MarioGPT: Open-Ended Text2Level Generation through LLMs

Here’s a playable demo of MarioGTP. Enjoy!

you can Have a nice week.

Thanks for reading Data Machina! Subscribe for free to receive new posts every week.

10 Link-o-Troned

  1. Wolfram: What Is ChatGPT Doing … and Why Does It Work?

  2. Stanford CS224N: Prompting, Finetuning & RLHF (2023 slides, pdf)

  3. Intro to Deep Causal Learning

  4. The Secret Sauce of TikTok’s Recommendations with Embedding

  5. Why I Chose Working @OpenAI Over Academia

  6. What Happened When a Top 5 UK Law Firm Piloted Harvey GPT Bot

  7. Promptable: Build Full Stack AI LLM Apps in Javascript

  8. Visualization of a Fully Connected Neural Network

  9. I Required My Students to Use ChatGPT. This is What I Learned

  10. On Hybrid, Vector Search for Improving Relevance & Ranking


Share Data Machina with your friends

the ML Pythonista

  1. A PyTool for Transcript Search & Summarization with ChatGPT

  2. [Tutorial] Focal Modulation: A Replacement for Self-Attention

  3. Overview of the New Powerful Features in BERTopic

the ML codeR

  1. The New forester Package for AutoML Tree-based Models

  2. Large Scale MatMul in a Laptop with R: DuckDB vs SQLite

  3. Time-Series Predictions with NBeats, XGBoost & tidymodels

Deep & Other Learning Bits

  1. Zero-shot Image-to-Text generation with BLIP-2

  2. Simple, Efficient, Long Convolutions for Sequence Modeling

  3. TPVFormer: An Alternative to Tesla's Occupancy Network

AI/ DL ResearchDocs

  1. Efficient 360 Vision Transformers for Industrial Apps

  2. A Survey on Efficient Training of Transformers

  3. Google Research: Scaling Vision Transformers to 22 Billion Parameters

El Robótico

  1. Meta AI: Generative Augmentation for Robot Learning

  2. CMU: Autonomous Robot that Learns with Little Supervision

  3. A Day in the Life of a Software Engineer @Amazon Robotics

data v-i-s-i-o-n-s

  1. [NYT Data Story] LeBron James: The Greatest NBA Scorer

  2. pyCirclize: Create Beautiful Circular Visualisations in Python

  3. Star Tours: Visualising a 3D Map of Our Galaxy

MLOps Untangled

  1. Understanding the MLOps Knot

  2. Streaming Large Scale ML Datasets with MosaicML

  3. [Free course] ML Engineering for Production (MLOps) Specialization

AI startups -> radar

  1. Bedtime Stories - AI for Crafting Wonderful Bedtime Stories

  2. VivaCity - AI for Smart Traffic Monitoring

  3. SandboxAQ - Enterprise SaaS for AI + Quantum Tech

ML Datasets & Stuff

  1. Mozilla Common Voice Dataset, 24K speech hours

  2. A Critical Field Guide for Working with ML Datasets

  3. MS Research DigiFace: 1 Million Digital Face Images

Postscript, etc

Enjoyed this post? Feel free to share it.

Share

Tips? Suggestions? Feedback? email Carlos

Curated by @ds_ldn in the middle of the night.

3
Share
Share this post

Data Machina #189

datamachina.substack.com
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Data Machina
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing