Data Machina #145

Someone asked about Machine Learning in production: 

What’s your team’s approach to tracking the quality of ML models in production? How do you know if a model is decaying? How do you quality-check the data going into a model? Who builds and tracks these things?

AnswerMo models, mo problems: tracking the quality of ML models in production


I see this in many enterprise projects: Why is It So Hard to Put Data Science in Production? and Your Deep Learning Startup for Enterprise Will Fail


A plea for your help to fund Data Machina: If you enjoy the weekly, full version of Data Machina, please help me funding it. Click here to subscribe before March 1 and get a 35% discount. I’m really thankful for all your help. After March1, to receive the free, short version of Data Machina every 2 weeks you don’t need to do anything.


10 Link-o-Troned

  1. Gaussian Processes Are Not So Fancy

  2. A Sane Intro to Maximum Likehood Estimation (MLE)

  3. Best Practices for Building Recommender Systems

  4. A Great Review of ACM RecSys 2018 Conference

  5. Tensors Considered Harmful: An Alternative

  6. A Review and Highlights of NLP 2018 [pdf, 50 pages]

  7. Uncertainty and Machine Learning: A Tutorial

  8. Causal Inference: Counterfactuals

  9. The Illustrated BERT… How NLP Cracked Transfer Learning

  10. [free course] Advances in Causality & Machine Learning

A Pythonista *Experience*

  1. Detecting Patterns & Anomalies in Massive Datasets

  2. StanfordNLP: SOTA, Multi-Language NLP in Python Torch

  3. A Very Simple Framework for State-of-the-Art NLP

Scripting aRt

  1. A Package for Faithful Dimensionality Reduction Visualisation

  2. Getting Started with Tensorflow Probability in R

  3. Feature Selection with Genetic Algorithms in R

Love from Julia

  1. A Library for Neural Differential Equations

  2. Intro to Bayesian Regression: Julia vs. Python & R

  3. Kubernetes with Julia

(Paren(th)ethical)

  1. Machine Learning in Clojure with XGBoost

  2. RTrees in Clojure

  3. MachineBox: Text & Image Classification in Clojure

ScalaTOR

  1. Spark Custom Stream Sources

  2. Type Safety and Spark Datasets in Scala

  3. Writing a Spark Dataframe to an Elasticsearch Index

data v-i-s-i-o-n-s

  1. ArViz: Visual Exploratory Analysis of Bayesian Models 

  2. A Visual Exploration of Gaussian Processes

  3. How to Visualise Decision Trees

Distributed de-Entangler

  1. Why is Storage on Kubernetes So Hard?

  2. Scaling Jupyter Notebooks with Kubernetes & Tensorflow

  3. Airflow: Lesser Known Tips, Tricks, and Best Practises

Blockchain Über Alles

  1. Ethereum Explained: Merkle Trees, Transactions & more

  2. An Intensive Introduction to Cryptography (Harvard Uni)

  3. Ethereum on DC/OS: Automate Blockchain Deployments

IoTea - everyThing/anyThing

  1. OpenEdge - Open Framework for Seamless Edge Computing

  2. Build Your Own IoT/MQTT node for Less than $2

  3. Hands-On Workshop on IoT with Arduino @IoTDevFest

Forschung!

  1. Generative Q&A: Learning to Answer the Whole Question

  2. Papers from Bayesian Deep Learning Workshop NIPS2018

  3. Generative Ensembles for Robust Anomaly Detection

Algorithmic Potpourri

  1. [free book] Algorithms, Jeff Erikson (Dec, 2018)

  2. Divide and Conquer Algorithms

  3. Transition Matrix Clustering Algorithms 

Robots & Cyborgs like <you>

  1. A Biomimetic, Bionic Flying Fox

  2. Inside Dexter - The Groundbreaking Robotic Arm

  3. Supervising Robots with Brain & Muscle Signals

Deep & Other Learning Bits

  1. Deep Learning State of the Art (2019) - MIT Talks

  2. Alibaba’s Industrial Deep Learning for HighDim Sparse Data

  3. A Map of The Many Approaches to Reinforcement Learning

startups -> radar

  1. Anybotics - Industrial, Autonomous Quadruped Robots

  2. Basis AI - A Modern Platform for Enterprise AI

  3. Kasada - AI for Advanced Bot Detection and Defences

ML Datasets & Stuff

  1. The AI Reasoning Challenge Dataset - Allen AI Institute

  2. A Large-scale Dataset for Visual Learning & Image Captioning

  3. Face Diversities Recognition Dataset - IBM Research

Postscript, etc

Spread the word Share Data Machina with your friends

Tips? Suggestions? Feedback? Send email to Carlos

Curated by Carlos @ds_ldn in the middle of the night.