Jade Choghari

I'm a Robotics Engineer on the LeRobot team at Hugging Face 🤗, where I work on large-scale robot learning, simulation, and transformer-based policies. Before robotics, I contributed to computer vision and multimodal AI across text, images, audio, and video. I am also a Computer Science student at the University of Waterloo.

Email / CV / Scholar / Twitter / LinkedIn / Github

Research

I'm interested in robot learning, large-scale simulation, and vision-language-action (VLA) policies. My research focuses on building scalable environments, datasets, and transformer-based control systems that enable robots to perform complex manipulation and real-world tasks. Below is a selection of my research and open-source work.

Selected Projects

	X-VLA: Soft-Prompted Vision-Language-Action Foundation Model 2025 Built X-VLA in LeRobot, the first soft-prompted VLA model capable of scaling across many embodiments, cameras, action spaces, and environments through a unified transformer backbone. We release 6 checkpoints, including a cloth-folding model achieving 100% success over a 2-hour continuous run, with a 1.5k-episode cloth-folding dataset to support community fine-tuning. Docs/Blog, PR, Twitter
	EnvHub: A Community Push to Scale Simulation Environments 2025 Launched EnvHub, a large-scale initiative to make simulation environments shareable and reusable across the community. Enables one-line loading of Isaac, MuJoCo, Genesis, and custom tasks into LeRobot—reviving the 2017 OpenAI Gym call-to-action, but for modern robot learning. Docs, Twitter
	LeRobot: Unified Evaluation Stack for VLA Benchmarks 2025 Built the evaluation stack for LeRobot, integrating LIBERO, MetaWorld, and other benchmarks to evaluate VLA models across 130+ manipulation tasks. Github, Twitter
	VLAb: Pretraining Toolkit for VLA Models 2025 A streamlined library for pretraining VLA models with multi-dataset support and distributed training. Built the pretraining stack used to reproduce SmolVLA and scale workflows across multi-GPU and SLURM clusters. Github
	LeRobot: Machine Learning Framework for Real-World Robotics 2025 An open-source PyTorch library for end-to-end robot learning, providing datasets, policies, simulation tools, and training pipelines across diverse robots. Github
	Gym-Genesis: GPU-Accelerated Simulation Environments for Robotics 2025 A vectorized Gym-style environment wrapper for the Genesis physics engine, enabling thousands of parallel environments on GPU for high-throughput robot learning. I developed core environment wrappers, improved observation pipelines, and added SO101 with imitation-learning demos. Github
	Roomi Robot: Open-Source Autonomous Housekeeping Robot 2025 A low-cost mobile manipulation robot for housekeeping tasks, combining a mobile base, dual arms, and multi-camera perception for tasks like towel replacement, trash collection, and restocking. Github
	SmolVLA: A Vision-Language-Action Model for Efficient Robotics 2025 A lightweight VLA model designed for affordable robot learning with scalable simulation and training pipelines. I built the simulation stack, developed scalable environments, and improved throughput, realism, and reproducibility for large-scale VLA training. Github
	RT-DETRv2: Improved Real-Time Detection Transformer 2025 An enhanced real-time DETR with selective multi-scale sampling, optional discrete attention for easier deployment, and dynamic data augmentation for stronger performance. I contributed the official RT-DETRv2 integration in Hugging Face Transformers, including model architecture, preprocessing, and training utilities. Hugging Face Transformers, LinkedIn
	TextNet: Fast Backbone for Arbitrary-Shaped Text Detection 2023 A NAS-designed backbone for detecting arbitrarily-shaped and rotated text, using asymmetric kernels to capture extreme aspect ratios. I contributed the TextNet integration in Hugging Face Transformers and added the TextNetForImageClassification variant. Hugging Face Transformers, Twitter
	Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs arXiv 2024 A multimodal LLM specialized for mobile UI screens, with strong referring, grounding, and reasoning abilities across fine-grained UI tasks. I integrated Ferret-UI into Hugging Face and contributed to the model and demo tooling. Hugging Face Transformers Twitter
	VidToMe: Video Token Merging for Zero-Shot Video Editing arXiv 2024 Improves zero-shot video editing by merging redundant tokens across frames to enhance temporal consistency and reduce memory usage. I integrated VidToMe into Hugging Face and helped build the public demo. Github, Twitter
	Quality-Aware Masked Diffusion Transformer for Music Generation arXiv 2024 Introduces a quality-aware training framework and masked diffusion transformer for high-fidelity text-to-music generation, achieving SOTA results on MusicCaps and Song-Describer. I integrated QA-MDT into Hugging Face and contributed to model and demo tooling. Github, Twitter
	VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration 2024 Restores degraded speech using a unified flow-matching transformer trained on synthetic data, handling noise, reverberation, compression, and bandwidth artifacts. I integrated VoiceRestore into Hugging Face and contributed to the demo + tooling. Github, Twitter
	VFusion3D: Scalable 3D Generation from Video Diffusion Models ECCV 2024 Learns 3D assets from a single image using video diffusion models and synthetic multi-view data. I integrated VFusion3D into Hugging Face, built the full model architecture, and created the public Gradio demo. Github
	GTI: A Scalable Graph-Based Trajectory Imputation Method ACM SIGSPATIAL 2023 GTI is a scalable trajectory imputation approach that reconstructs sparse GPS trajectories without relying on existing maps, enabling data completion for map construction and urban mobility applications. Paper