Data Machina #159

A free weekly digest of AI/ML curiosities and other amenities

Jul 24, 2022

On Deep Learning and Tabular Data. Is it worth all the effort, complexity, and cost of Deep Learning for supervised learning tasks with tabular data? Are tree-ensemble methods faster, more accurate and more cost-efficient for tabular data than DL? Let me share some interesting stuff.

Back in 2020, Google published a paper titled: TabNet: Attentive Interpretable Tabular Learning in which the team claimed that TabNet outperforms previous work across tabular datasets from different domains (pdf.)

But since then, many people have disagreed with the TabNet paper results. In 2021, Michael Clark wrote an excellent summary of findings on DL for tabular data. His conclusion: Definitely Deep Learning is Not All You Need for Tabular Data.

In June 2022, the DSAR group @University of Tübingen, published a new paper: Deep Neural Nets and Tabular Data: A Survey. They claim that algorithms based on gradient-boosted tree ensembles still mostly outperform DL models on supervised learning (pdf.)

And just recently, a team at Inria Saclay & Sorbonne University -some of them from the scikit-learn team- published Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data? in which they provide a systematic benchmark and results showing that tree-based models remain state-of-the-art on medium-sized data even without accounting for their superior speed (pdf.)

US & UK competition on Privacy-Preserving ML (PPML). As per my previous post DM #158, PPML is likely to become part of the regulatory framework for finservices. Days ago the US & UK just announced a competition on Financial Crime, Healthcare and Privacy-Preserving Federated Learning. By the type and level of agencies & regulators involved, you can tell PPML is at the top of the agenda. Get here all the details on this Privacy Enhancing Tech Innovation Competition

10 Link-o-Troned

Share Data Machina

A Pythonista Experience

beCause of Dennis & Bjarne

Scripting aRt

Love from Julia

(Paren(th)ethical)

ScalaTOR

data v-i-s-i-o-n-s

Distributed de-Entangler

Forschung!

Algorithmic Potpourri

Robots & Cyborgs like <you>

Deep & Other Learning Bits

startups -> radar

ML Datasets & Stuff

Postscript, etc