-
Notifications
You must be signed in to change notification settings - Fork 30.2k
Open
Labels
Core: ModelingInternals of the library; Models.Internals of the library; Models.VisionWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressbug
Description
System Info
transformers
version: 4.45.2- Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.39
- Python version: 3.11.9
- Huggingface_hub version: 0.24.6
- Safetensors version: 0.4.4
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: no
- Using GPU in script?: yes
- GPU type: NVIDIA RTX A6000
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Run _init_weights
Expected behavior
The _init_weights method of Mask2Former has multiple problems. It initializes nn.Embeddings with an std of .02 (original Mask2Former code uses PyTorch's default init with std of 1.0). Similarly, the mask MLP is initialised wrongly with zero biases. Finally, another example of a problem is that the initialisation of the multi-scale deformable attention is overwritten by the branch for the Mask2FormerPixelDecoderEncoderOnly.
NiccoloCavagnero
Metadata
Metadata
Assignees
Labels
Core: ModelingInternals of the library; Models.Internals of the library; Models.VisionWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressbug