Skip to content

Conversation

@surajyadav-research
Copy link

What does this PR do?

This PR fixes LoRA integration for LongCatImagePipeline so that load_lora_weights() properly applies the adapter during inference and unload_lora_weights() cleanly restores the base (non-LoRA) behavior.

It also adds a slow regression test that:

  1. runs the pipeline without LoRA (baseline),
  2. loads a LoRA and verifies the output changes,
  3. unloads the LoRA and verifies the output returns close to the baseline.

Why?

Addresses the reported LoRA load/unload issue for LongCat:

Tests

  • RUN_SLOW=yes pytest -q tests/pipelines/longcat_image/test_longcat_lora.py

@surajyadav-research
Copy link
Author

Hi @sayakpaul
CI is currently “awaiting approval from a maintainer”. Could you please approve the workflow runs?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@surajyadav-research
Copy link
Author

Hi @sayakpaul,
I’ve fixed the issue in the test file and pushed the update. Would appreciate your review when convenient.



class LongCatImagePipeline(DiffusionPipeline, FromSingleFileMixin):
class LongCatImagePipeline(DiffusionPipeline, FluxLoraLoaderMixin, FromSingleFileMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite incorrect to me.

Flux has two LoRA loadable modules:

_lora_loadable_modules = ["transformer", "text_encoder"]

For LongCat, it uses a different text encoder (Flux uses two text encoder, let along) and rest of its components also seems to be different from Flux:

def __init__(
self,
scheduler: FlowMatchEulerDiscreteScheduler,
vae: AutoencoderKL,
text_encoder: Qwen2_5_VLForConditionalGeneration,
tokenizer: Qwen2Tokenizer,
text_processor: Qwen2VLProcessor,
transformer: LongCatImageTransformer2DModel,
):

So, could you please explain how using the FluxLoraLoaderMixin is appropriate here?

Instead, I suggest we write a dedicated LoRA loader mixin class for LongCat -- LongCatLoraLoaderMixin. You can refer to

class QwenImageLoraLoaderMixin(LoraBaseMixin):

as an example.

@@ -0,0 +1,107 @@
# Copyright 2025 The HuggingFace Team.
Copy link
Member

@sayakpaul sayakpaul Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. Please try to consult the existing testing structure for pipeline-level LoRA testing c.f. https://github.com/huggingface/diffusers/tree/main/tests/lora/

@surajyadav-research
Copy link
Author

Thanks for the clarification @sayakpaul
I agree the current inheritance from FluxLoraLoaderMixin is misleading here.

While implementing this, my reasoning was that FluxLoraLoaderMixin mainly reuses the generic LoRA load/fuse path and simply assumes the loadable entry points are ["transformer", "text_encoder"]. But since LongCat’s text_encoder is Qwen2_5_VLForConditionalGeneration and the transformer is LongCatImageTransformer2DModel (i.e., not Flux’s text-encoder setup), this can easily lead to subtle issues like incorrect key routing, mismatched target modules, or silent partial loads.

So I’ll switch to a dedicated LongCatLoraLoaderMixin to make the intent explicit and handle any LongCat-specific routing cleanly.

alvarobartt and others added 7 commits January 4, 2026 21:31
…v5.0+) (huggingface#12877)

Use `T5Tokenizer` instead of `MT5Tokenizer`

Given that the `MT5Tokenizer` in `transformers` is just a "re-export" of
`T5Tokenizer` as per
https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/models/mt5/tokenization_mt5.py
)on latest available stable Transformers i.e., v4.57.3), this commit
updates the imports to point to `T5Tokenizer` instead, so that those
still work with Transformers v5.0.0rc0 onwards.
* Add z-image-omni-base implementation

* Merged into one transformer for Z-Image.

* Fix bugs for controlnet after merging the main branch new feature.

* Fix for auto_pipeline, Add Styling.

* Refactor noise handling and modulation

- Add select_per_token function for per-token value selection
- Separate adaptive modulation logic
- Cleanify t_noisy/clean variable naming
- Move image_noise_mask handler from forward to pipeline

* Styling & Formatting.

* Rewrite code with more non-forward func & clean forward.

1.Change to one forward with shorter code with omni code (None).
2.Split out non-forward funcs: _build_unified_sequence, _prepare_sequence, patchify, pad.

* Styling & Formatting.

* Manual check fix-copies in controlnet, Add select_per_token, _patchify_image, _pad_with_ids; Styling.

* Add Import in pipeline __init__.py.

---------

Co-authored-by: Jerry Qilong Wu <xinglong.wql@alibaba-inc.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* fix torchao quantizer for new torchao versions

Summary:

`torchao==0.16.0` (not yet released) has some bc-breaking changes, this
PR fixes the diffusers repo with those changes. Specifics on the
changes:
1. `UInt4Tensor` is removed: pytorch/ao#3536
2. old float8 tensors v1 are removed: pytorch/ao#3510

In this PR:
1. move the logger variable up (not sure why it was in the middle of the
   file before) to get better error messages
2. gate the old torchao objects by torchao version

Test Plan:

import diffusers objects with new versions of torchao works:

```bash
> python -c "import torchao; print(torchao.__version__); from diffusers import StableDiffusionPipeline"
0.16.0.dev20251229+cu129
```

Reviewers:

Subscribers:

Tasks:

Tags:

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants