Data Machina #146

State-of-the-Art in NLP: There’s so much going on in NLP with Transformers, Transfer Learning, Pre-Trained Language Models, Deep NLP… it’s really difficult to keep up… ELMo, ULMFiT, BERT

Here’s the latest presentation on BERT by Jacob its ‘creator’ @Google AI

I suggest you check out the latest materials from the 2 courses below too:

You are reading the weekly, full version of Data Machina. If you’d like to keep receiving it after March 1, please click here to subscribe and get a 35% off discount. I’ll be massively thankful for your contributions and support. If you don’t subscribe before March 1, you’ll receive a free, shorter version of Data Machina every 2 weeks.

10 Link-o-Troned

  1. The Multi-Armed Bandit Problem and Its Solutions

  2. How to Build Auto Machine Learning from Scratch

  3. Why Are Machine Learning Projects So Hard to Manage?

  4. Transfer Learning for Natural Language Generation

  5. Fast, Scalable Gradient Boosting on Decision Trees in Python, R, C++

  6. Introducing Uber’s Ludwig: A Code-free Deep Learning Toolbox

  7. Python, R & Scala Challenges: Why Swift for Tensorflow ML

  8. New Trends in Large, Generalized, Pre-Trained Language Models

  9. Facebook’s New, SotA Predictive Model for Grid Mapping

  10. Limitations of Deep Learning for Vision: How We Might Fix Them

A Pythonista *Experience*

  1. Spektral - Build Graph Neural Networks on Top of Keras

  2. A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

  3. Open Source, Version Control System for Machine Learning Projects

beCause of Dennis & Bjarne

  1. A C++ Framework for Realtime Machine Learning

  2. DeepTrainer - An Open Source C++ Library for Deep Learning

  3. A C++ Implementation of K-Means & KNN on MNIST DataSet

Scripting aRt

  1. Modelling Time Series Unexpected Shocks w/ Kalman Filters

  2. Clustering Multiple Time-series Data Streams

  3. Hyper-parameter Optimisation: Sobol Sequence vs. Uniform Random

Love from Julia

  1. Tutorial: GeoStatistics in Julia

  2. Travelling Salesman Problem for 200K Cities

  3. Optimised Directed & Undirected Graphs in Julia


  1. AI Sytems: Foundations for AI Minds

  2. Deep Learning in Clojure from Scratch: Why Bother?

  3. Blockchains as Information Systems with Clojure


  1. Automate ML Workflows w/ Scala & Spark at Massive Scale

  2. Structured Deep Learning w/ Probabilistic Neural Programs

  3. High Performance Functional Bayesian Inference in Scala

data v-i-s-i-o-n-s

  1. Quick, Flexible Visual Representations of Large Datasets

  2. How The BBC Visual & Data Journalism Team Works w/ Graphics

  3. A New, Scalable, Multivariate Graph Visualization Technique

Distributed de-Entangler

  1. Cloud Coding Simplified: A Berkeley View on Serverless Computing

  2. Serverless Data Ingestion into Google BigQuery

  3. Designing Data Intensive Apps & Scalable Systems [pdf, 559 pages]

Blockchain Über Alles

  1. Blockchain ETL - Making Blockchain Data Easy to Access

  2. A Powerful Framework for Developing & Deploying DApps

  3. A Demo of a Passenger Journey in Blockchain

IoTea - everyThing/anyThing

  1. Overview: Edge TPU Devices for Embedded Machine Learning

  2. What’s Wrong with the Raspberry Pi

  3. Build Your Own Dial Up Server with Linux Devices


  1. A Guiding Principle for Causal Decision Problems

  2. Browse State-of-the-Art ML Papers with Code & Datasets

  3. Learning and Evaluating General Linguistic Intelligence

Algorithmic Potpourri

  1. Bubble Sort: An Archaeological Algorithmic Analysis

  2. Optimized Brewery Road Trip with Genetic Algorithm

  3. Hashing for Large Scale Similarity

Robots & Cyborgs like <you>

  1. [free book] Modern Robotics, Cambridge Uni (pdf, 617 pages)

  2. Deep Learning for Robotics - Current Research Topics

  3. Real-time, 6DoF Perception & Navigation for Commercial Drones

Deep & Other Learning Bits

  1. Deep Learning UC Berkeley, 2019 - 28 Video Courses

  2. Transfer Learning: Understanding ULMFIT Building Blocks

  3. Practical Deep Learning for Coders v3, 2019

startups -> radar

  1. Munchron - AI for Detecting Medical Conditions from Images

  2. Behavox - 1st-ever AI-based Behavioral OS for Financial Services

  3. Skyline- Real Estate Investment Meets AI

ML Datasets & Stuff

  1. GQA Dataset - 20 Million Questions on Real-world Images

  2. Google Natural Questions Dataset

  3. Hotels-50K: A Global Hotel Recognition Dataset

Postscript, etc

Spread the word Share Data Machina with your friends

Tips? Suggestions? Feedback? Send email to Carlos

Curated by Carlos @ds_ldn in the middle of the night.