Data Machina #221
LLM-RAG Apps Infeasible? Causal Topological Deep Learning. First look at GPT-4V. Visualising MatMul & Attention. AgentOps. Autogen & AI Agents. SapientML. Mistral 7B. MIC V-L Model. AnyMal Any Model.
GenAI, LLM+RAG Apps = Business Frustration. This week, a few customers called in saying they are stopping their PoCs on LLM RAG apps. tbh, I’m not surprised. Why? Sure! Lots of “exciting” experimentation & exploration, but RAG is not (yet) ready for enterprise prod. Many business guys can’t (yet) justify a proper business case that correctly balances: cost vs. value vs. accuracy, relevancy and quality of LLM outputs for building reliable, consistent enterprise apps. It’s reality mate!
My friend Christian, recently wrote about this in A Deep Dive into the (In)Feasibility of RAG with LLMs. A great self-reflection on falling through the LLM-RAG rabbit hole. In another post, Madhukar provides some good tips on Optimising RAG LLM Apps for Better Performance, Accuracy and Lower Costs. But enough of LLMs for now!
Topological Data Analysis (TDA.) When I was at the uni, I miserably failed the courses on Algebraic Topology & Descriptive Geometry several times! If you want to torture your brain smoothly grab a beer and read this free book: Algebraic Topology for Data Scientists, Aug 2023 (pdf, 309 pages.)
Eventually I recovered from all that. Many years later, I got interested in applying topology to data & ML in enterprise apps. I met amazing startups, researchers, and engineers who were working on TDA. But at the time, TDA was not ready for business apps. Things have changed quite a lot though recently. Let’s see.
So What is Topological Data Analysis: An intro. TDA is about analysing the shape of data, and discovering hidden data patterns with topology; not a fairly intuitive concept. It requires a background in sets theory, descriptive geometry, and so on. But if you step away from the deep theory, you grab the basic concepts, and think about it in practical terms, there is a lot of value on applying TDA to data & ML. This is a nice series of intro videos on TDA.
A Tutorial on TDA. A practical, live, Python tutorial & repo on TDA.
A free book on Topological Data Analysis. This book covers the application of topological techniques to traditional data analysis. Download it here: Computational Topology for Data Analysis.
Deep Learning/ ML & Topological Data Analysis. In recent years, there’s been a lot of advances in combining ML/DL & Topological Methods together. Partly due to the exponential improvements in computational methods and DL, and also due to the more practical, new ways of translating topology & math theory into real-world apps.
Why Deep Learning & Topological methods together? It’s not necessarily apparent or intuitive why you would like to come up with such a combo. In this blog post, Chris (one of the co-founders of Anthropic AI, formerly a researcher @OpenAI) explains why in this post: Neural Networks, Manifolds, and Topology.
Intro to Topological Deep Learning. A good 45 min. introduction and discussion on Topological DL. This video session is based on the paper Architectures of Topological Deep Learning. A beautifully written paper that surveys the latest on Topological Neural Networks and Topological Deep Learning. Recommended read.
Topological techniques for Unsupervised Learning. An awesome talk on how you can apply TDA to unsupervised learning tasks like clustering, high dimensional reduction, embeddings, and also large-scale dataviz..
A topological ML pipeline for classification. An easy, ready-to-use pipeline for data classification using Topology & ML. The pipeline links persistence diagrams to digital data, using efficient filtration for the type of data considered. Paper: A Topological Machine Learning Pipeline for Classification.
Anomaly detection with topological analysis. An approach to use topology for anomaly detection. It is essentially a density based outlier detection algo that, instead of calculating local densities, constructs a graph of the data using nearest-neighbors. Checkout this repo and associated paper: Topology Anomaly Detection
Awesome Topological DL . This is a great curated list of topological deep learning (TDL) resources and links. Link: Awesome Topological Deep Learning.
Causality + TDA + Deep Learning. There have been some recent, awesome developments in Causal Discovery. Checkout the very latest v2 of Salesforce CausalAI Library: A Fast, Scalable framework for Causal Analysis of Time Series & Tabular Data.
If you can combine Causal Discovery with TDA and DL, you could do some pretty amazing stuff to solve real-business problems. Sure! TDA is not a panacea -it has some shortcomings-, nor DL, or Causal Discovery. But this combo is pretty powerful.
All this brings me to DataRefiner, a startup working on Deep Topological Analysis. Checkout this post on The Advantages of Deep TDA: Topological Data Analysis + Self-Supervised ML. Also read more about their latest new approach on combining Deep TDA, Causal Discovery and Boosted Trees.
Disclosure and blatant plug: I hold a position as Director AI Consulting at DataRefiner. I met Ed (the founder of DataRefiner) ages ago, when he was starting R&D on TDA. Ed is an amazing, smart ML engineer. At the time, I challenged him to address several key issues that were preventing the adoption of TDA in enterprise. Many moons later, he came back and said: “OK cool, I sorted out all the issues you said, and much more!” If you’re are interested in the cool stuff :-) we’re doing at DR or simply want to chat about Deep TDA over coffee, drop me an email at carlos@datarefiner.com.
Have a nice week.
10 Link-o-Troned
the ML Pythonista
Deep & Other Learning Bits
AI/ DL ResearchDocs
HQ Video Generation with Diffusion Models (paper, code, demo)
ProlificDreamer: High-Fidelity, Diverse Text-to-3D Generation
Meta AI AnyMAL: An Efficient, Scalable Any-Modality Augmented LM
data v-i-s-i-o-n-s
MLOps Untangled
AI startups -> radar
ML Datasets & Stuff
Postscript, etc
Tips? Suggestions? Feedback? email Carlos
Curated by @ds_ldn in the middle of the night.