Data Machina #147

The NLP Arms Race continues… Just a few days ago Microsoft AI released MT-DNN, a Multi-task Deep NN that outperforms Google’s BERT in almost every NLP benchmark.

OpenAI became ClosedAI as they decided not to release their GPT-2 large scale model for fake text generation because it was… too dangerous.

The folks @OpenAI thought I’d be a great idea to call a few selected journos to show them the results behind closed doors and spin the PR news cycle: New AI fake text generator may be too dangerous to release, say creators

Several people in the NLP/ML community followed up with some interesting posts:

In the meantime, someone has published an Open Clone of OpenAI's Unreleased WebText Dataset …LOL!

A plea for your help to fund Data Machina. Help me keep Data Machina fully independent and free from ads, sponsors, or any marketing bias.

This is the weekly, full version of Data Machina.

If you’d like to keep receiving it after March 1, please click here to subscribe and get a 35% off discount. You don’t have to do anything to receive a free, much shorter, version of Data Machina with a few topics every 2 weeks.

Your paid subscriptions will help me keep Data Machina independent and will enable me to curate only 100% unbiased content always. Many thanks for your support and help.

10 Link-o-Troned

  1. Probabilistic Programming and AI

  2. Data Science is Different Now

  3. The Unreasonable Effectiveness of Deep Feature Extraction

  4. How Powerful are Graph Neural Networks [pdf opens slowly]

  5. In-Depth Tutorial: AllenNLP (From Basics to ELMo & BERT)

  6. OpenAI: Better Language Models and Their Implications

  7. Berkeley AI: Controlling False Discoveries in Large-Scale Experiments

  8. Facebook AI Open Sources New ELFOpenGo Dataset and Research

  9. An Open Source Engine for Search & Machine Learning Ranking

  10. Andrew Ng: How to Choose Your First AI Project

  11. Yann LeCun: Deep Learning Will Require New Types of Hardware

A Pythonista *Experience*

  1. Automatic Differentiation + Optimization in PyTorch

  2. Neural Nets + Gaussian Processes: The Neural Processes Family

  3. Pretrained Language Models for Google's BERT, OpenAI GPT-2

beCause of Dennis & Bjarne

  1. Mask R-CNN for Object Segmentation in C++

  2. xForest - Super Fast, Scalable Random Forests in C++

  3. Flashlight - A C++ Library for Machine Learning

Scripting aRt

  1. Probability & Statistics: A Simulation-based Introduction

  2. Explore & Visualise Boosted Regression Trees

  3. Anatomy of a Logistic Growth Curve

Love from Julia

  1. A Julia Package for Prescriptive Analytics

  2. Solving Partially Observable Markov Decision Processes

  3. Julia Reinforcement Learning


  1. Clojure at Netflix: The Good, The Bad & The Ugly

  2. Object Detection with MXNet Clojure

  3. Intro to Probabilistic Programming with MIT’s MetaProb


  1. Testing Machine Learning Sytems in Staging

  2. Introduction to Kafka Streaming with Scala

  3. Initial Impressions of Scala from a Java&Python Data Engineer

data v-i-s-i-o-n-s

  1. Visualising Global Temperature Anomalies 1880-2017

  2. An Alt, Data-Driven Country Map [Winner World Dataviz Prize]

  3. The DNA of Good Government [Winner World Dataviz Prize]

Distributed de-Entangler

  1. What Comes after Serverless? A Deployless Future

  2. Federated Learning: The Future of Distributed Machine Learning

  3. Hipster Shop: Cloud-Native Microservices Demo App & Code

Blockchain Über Alles

  1. ETH Zurich Research Bitcoin as a Transaction Ledger (pdf)

  2. The Ocean Protocol for Decentralized AI Data & Services (pdf)

  3. OS Blockchain & Smart Contracts with Hyperledger Fabric

IoTea - everyThing/anyThing

  1. From Tensorflow to ML Kit: ML for Android Apps

  2. Machine Learning for Mobile with Tensorflow

  3. QP/C++ Open Framework for Real-time Embedded Systems


  1. A New Theory for Selective Prediction

  2. Explainable Text-Driven Neural Net for Stock Prediction

  3. Cool Papers: Machine Learning/AI in Fashion

Algorithmic Potpourri

  1. Closeness Centrality in Neo4j

  2. Reinforcement Learning Algorithms: Free Book & Tutorial

  3. [free book] Algorithms for Walking, Running, Flying… Robots

Robots & Cyborgs like <you>

  1. Self-Driving Cars MIT Lecture, Chief Scientist @Waymo

  2. PythonRobotics - A Collection of Python Code for Robotics

  3. Robotic Soft Sensing with Embedded Sensors & RNNs

Deep & Other Learning Bits

  1. Introduction to Reinforcement Learning [pdf, 519 pages]

  2. xfer: Open Source Neural Network Transfer Learning

  3. Deep Unsupervised Learning - UC Berkeley Spring 2019

startups -> radar

  1. Kite - AI Turbocharged Python Programming

  2. BlazingDB - GPU-accelerated SQL for AI Workloads

  3. - Everyone’s Intelligent AutoML

ML Datasets & Stuff

  1. UK Weather Stations Dataset 1853-2019

  2. The NSFW (Not Safe for Work) Dataset, 220K Images

  3. The Visual AI Dialog Challenge Dataset

Postscript, etc

Spread the word Share Data Machina with your friends

Tips? Suggestions? Feedback? Send email to Carlos

Curated by Carlos @ds_ldn in the middle of the night.