LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

5,743
2.14 stars / hour

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

bin123apple/autocoder 23 May 2024

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90. 9\%}$ vs. $\mathbf{90. 2\%}$).

Class-level Code Generation Code Completion +7

638
1.81 stars / hour

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments

NVIDIA-Omniverse/Orbit 10 Jan 2023

We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim.

Imitation Learning Motion Planning +4

1,164
1.39 stars / hour

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

nnanhuang/s3gaussian 30 May 2024

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.

3D Reconstruction 3D Scene Reconstruction +1

207
1.34 stars / hour

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

aigc-apps/easyanimate 29 May 2024

The motion module can be adapted to various DiT baseline methods to generate video with different styles.

Image Generation Video Generation

459
1.25 stars / hour

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

vikparuchuri/marker 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Language Modelling Question Answering

11,054
1.23 stars / hour

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

openmeshlab/meshxl 31 May 2024

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications.

Language Modelling Large Language Model

82
1.19 stars / hour

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

205
1.14 stars / hour

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

myniuuu/mofa-video 30 May 2024

We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations.

Image Animation Video Generation

116
0.97 stars / hour

The Road Less Scheduled

facebookresearch/schedule_free 24 May 2024

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.

Scheduling

1,437
0.96 stars / hour