-
Johns Hopkins University
- Maryland, USA
- https://vibashan.github.io/
Stars
AGENTS.md — a simple, open format for guiding coding agents
Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
World Modeling by Forecasting Vision Foundation Model Features
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
A framework for efficient model inference with omni-modality models
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
Devkit and documentation for the NVIDIA Physical AI Autonomous Vehicles Dataset
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
A character-level language diffusion model trained on Tiny Shakespeare
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
A minimal implementation of DeepMind's Genie world model
RynnVLA-002: A Unified Vision-Language-Action and World Model
Training framework with a goal to explore the frontier of sample efficiency of small language models
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
SEED-Voken: A Series of Powerful Visual Tokenizers
Implementation of MagViT2 Tokenizer in Pytorch
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model




