Data Machina #231

Generative AI 3D Models. ReConFusion. LucidDreamer. DreaMo. HighFi4G. Mamba-chat. Apple MLX. Zephyr AIF. Pearl RL AI Agent. Schrödinger Bridges

Dec 10, 2023

Generative AI 3D Models. I keep meeting colleagues in finance and insurance who are quite frustrated about the time, cost, and complexity involving the dev of production ready, enterprise RAG apps. Many of them tell me about their nightmarish MLOps scenarios of a RAG pipeline full of “unreliable/ inconsistent” dependencies like: LangChain, LlamaIndex, GPT Functions, embeddings, vector DBs/ search, external AI model APIs, hallucinations, fine-tuning, LoRA…

In the meantime, I meet people in media and marketing who are absolutely loving GenAI. Which brings me to Generative AI 3D models. Some amazing stuff happening in this area. Many of these GenAI 3D models combine methods like Gaussian splatting, diffusion models, Neural Radiance Fields (NeRF.) Here’s the latest on GenAI 3D models.

Real-world scenes reconstruction in 3D. Researchers at Google & Columbia Uni, just introduced ReConFusion: a new NeRF model that reconstructs real-world scenes from a few images. The model regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images. ReconFusion beats previous NeRF models. Checkout paper and watch the video demos: ReconFusion: 3D Reconstruction with Diffusion Priors.

3D scenes generation without domain limitations. Until recently, 3D scene AI Gen models, limited the target scene to a specific domain, primarily due to their training strategies using 3D scan dataset. LucidDreamer is a new domain-free scene generation pipeline that fully leverages the power large-scale diffusion models. Checkout the paper, code and demos: LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes.

Articulated 3D reconstruction from one video. Existing methods for articulated 3D generation are expensive because they require a lot of work from domain experts, and template-free learning methods that use e.g. monocular video require full coverage of all viewpoints of the subject in the input video. DreaMo is a new Gen AI model that solves those issues. Checkout the paper and code: DreaMo: Articulated 3D Reconstruction From A Single Casual Video, and the demo video below.

Generation of photo-real human models. Efficiently rendering realistic photo-real human models and the required rasterisation pipeline is challenging and expensive. In this paper, a group of researchers just introduced HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Checkout the paper and watch the awesome video demo: HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting.

Generating high quality, animated human avatars. Meta AI researchers just introduced Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. The geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter face details of the avatar. The model can be efficiently relit in real-time under both point light and continuous illumination. Checkout the paper and watch the amazing demos: Relightable Gaussian Codec Avatars.

AI activities for the w/e. I love this collaborative, community game project: How (not) to get hit by a self-driving car. Can an AI detect you on the street? If you win you have a dilemma: either train the AI or not. Every player’s win generates vital data that exposes the inability of the AI to detect pedestrians and highlights the flaws of AI, which could potentially be used to improve self-driving cars in the future.