ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution

In this paper, we tackle the high computational overhead of transformers for lightweight image super-resolution. (SR). Motivated by the observations of self-attention's inter-layer repetition, we introduce a convolutionized self-attention module named Convolutional Attention (ConvAttn) that emulates self-attention's long-range modeling capability and instance-dependent weighting with a single shared large kernel and dynamic kernels. By utilizing the ConvAttn module, we significantly reduce the reliance on self-attention and its involved memory-bound operations while maintaining the representational capability of transformers. Furthermore, we overcome the challenge of integrating flash attention into the lightweight SR regime, effectively mitigating self-attention's inherent memory bottleneck. We scale up window size to 32×32 with flash attention rather than proposing an intricated self-attention module, significantly improving PSNR by 0.31dB on Urban100×2 while reducing latency and memory usage by 16× and 12.2×. Building on these approaches, our proposed network, termed Emulating Self-attention with Convolution (ESC), notably improves PSNR by 0.27 dB on Urban100×4 compared to HiT-SRF, reducing the latency and memory usage by 3.7× and 6.2×, respectively. Extensive experiments demonstrate that our ESC maintains the ability for long-range modeling, data scalability, and the representational power of transformers despite most self-attentions being replaced by the ConvAttn module.

This repository is an official implementation of the paper "Emulating Self-attention with Convolution for Efficient Image Super-Resolution", ICCV, 2025.

by Dongheon Lee, Seokju Yun, and Youngmin Ro

[Paper] [Supp] [Pre-trained Models]

Update

[2025-12-31] Now ESC supports FlashAttention with F.scaled_dot_product_attention using FlashBias [NeurIPS 2025]. We provide FlashBias implementation for esc and esc_real architectures and release pre-trained weights. Do not use FlashBias version for academic evaluation.

ClassicSRx2 trained on DIV2K

Method	Set5	Set14	B100	Urban100	Manga109
ESC (Flex Attention)	38.35 / 0.9619	34.11 / 0.9223	32.41 / 0.9027	33.46 / 0.9395	39.54 / 0.9790
ESC (FlashBias)	38.35 / 0.9619	34.06 / 0.9221	32.41 / 0.9027	33.43 / 0.9392	39.53 / 0.9790

RealSRx4 trained on DF2KOST (LQ250; Reconstructing HD image @RTX4090)

Method	NIQE (↓)	MANIQA (↑)	MUSIQ (↑)	CLIPIQA (↑)	Latency	Memory Usage
ESC-Real (Flex Attention)	4.0556	0.3553	62.98	0.5796	59.9 ms	715.9 mb
ESC-Real (FlashBias)	3.9649	0.3503	62.56	0.5659	51.1 ms	730.2 mb

Real-world SR Visual Results

Installation

git clone https://github.com/dslisleedh/ESC.git
cd ESC
conda create -n esc python=3.10
conda activate esc
pip3 install torch torchvision torchaudio  # pytorch 2.6.0 and cuda 12.4
pip install -r requirements.txt 
python setup.py develop

Training

Single GPU

python esc/train.py -opt $CONFIG_PATH

Multi GPU (local)

PYTHONPATH="./:${PYTHONPATH}" CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch\
  --nproc_per_node=4 --master_port=5612 \
 esc/train.py -opt $CONFIG_PATH --launcher pytorch

Multi GPU (SLURM)

PYTHONPATH="./:${PYTHONPATH}" GLOG_vmodule=MemcachedClient=-1 srun -p $PARTITION --mpi=pmi2 \
    --gres=$GPUS --ntasks=4 --cpus-per-task $CPUs --kill-on-bad-exit=1 \
    python -u esc/train.py -opt $CONFIG_PATHl --launcher="slurm"

Testing

python esc/test.py -opt $CONFIG_PATH

Results

ClassicSR Quantitative Results on the DIV2K dataset

ClassicSR Quantitative Results on the DFLIP dataset

DFLIP datasets consist of 4 datasets: DIV2K, Flickr2K, LSDIR, and DiverSeg-IP. We leverage the DFLIP datasets to demonstrate our method's data scalability.

Arbitrary-scale SR Quantitative Results

Real-world SR Quantitative Results

Acknowledgement

This work is based on BasicSR and HAT. We thank them for their great work and for sharing the code.

Citation

If you find this code useful for your research, please consider citing the following paper:

@InProceedings{Lee_2025_ICCV,
    author    = {Lee, Dongheon and Yun, Seokju and Ro, Youngmin},
    title     = {Emulating Self-attention with Convolution for Efficient Image Super-Resolution},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {24467-24477}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
basicsr		basicsr
esc		esc
esc_arb		esc_arb
figs		figs
options		options
scripts		scripts
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution

Update

ClassicSRx2 trained on DIV2K

RealSRx4 trained on DF2KOST (LQ250; Reconstructing HD image @RTX4090)

Real-world SR Visual Results

Installation

Training

Single GPU

Multi GPU (local)

Multi GPU (SLURM)

Testing

Results

ClassicSR Quantitative Results on the DIV2K dataset

ClassicSR Quantitative Results on the DFLIP dataset

Arbitrary-scale SR Quantitative Results

Real-world SR Quantitative Results

Acknowledgement

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

dslisleedh/ESC

Folders and files

Latest commit

History

Repository files navigation

ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution

Update

ClassicSRx2 trained on DIV2K

RealSRx4 trained on DF2KOST (LQ250; Reconstructing HD image @RTX4090)

Real-world SR Visual Results

Installation

Training

Single GPU

Multi GPU (local)

Multi GPU (SLURM)

Testing

Results

ClassicSR Quantitative Results on the DIV2K dataset

ClassicSR Quantitative Results on the DFLIP dataset

Arbitrary-scale SR Quantitative Results

Real-world SR Quantitative Results

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages