Trending Research

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm • • 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

5,743

2.14 stars / hour

Paper
Code

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

bin123apple/autocoder • 23 May 2024

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90. 9\%}$ vs. $\mathbf{90. 2\%}$).

Class-level Code Generation Code Completion +7

638

1.81 stars / hour

Paper
Code

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments

NVIDIA-Omniverse/Orbit • • 10 Jan 2023

We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim.

Imitation Learning Motion Planning +4

1,164

1.39 stars / hour

Paper
Code

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

nnanhuang/s3gaussian • • 30 May 2024

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.

3D Reconstruction 3D Scene Reconstruction +1

207

1.34 stars / hour

Paper
Code

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

aigc-apps/easyanimate • • 29 May 2024

The motion module can be adapted to various DiT baseline methods to generate video with different styles.

Image Generation Video Generation

459

1.25 stars / hour

Paper
Code

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

vikparuchuri/marker • • 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Language Modelling Question Answering

11,054

1.23 stars / hour

Paper
Code

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

openmeshlab/meshxl • 31 May 2024

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications.

Language Modelling Large Language Model

1.19 stars / hour

Paper
Code

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify • • 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

205

1.14 stars / hour

Paper
Code

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

myniuuu/mofa-video • • 30 May 2024

We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations.

Image Animation Video Generation

116

0.97 stars / hour

Paper
Code

The Road Less Scheduled

facebookresearch/schedule_free • • 24 May 2024

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.

Scheduling

1,437

0.96 stars / hour

Paper
Code