-
Notifications
You must be signed in to change notification settings - Fork 2.1k
WoRA integration into PEFT #2872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WoRA integration into PEFT #2872
Conversation
…pha/beta parameters
- Add WoRA to test variant map in test_lora_variants.py - Add test case for WoRA variant application to all layer types - Add test for WoRA alpha/beta parameter gradients - Fix WoRA parameter initialization in Embedding.update_layer - Fix WoRA parameter initialization in _ConvNd.update_layer - Fix WoraEmbeddingLayer to include alpha in computation - Fix WoraConvNdLayer gradient flow for alpha/beta parameters - Transpose embedding matrices in WoraEmbeddingVariant.forward - Add embed_scale support in WoraEmbeddingVariant All WoRA tests now pass successfully.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
no not stale . and since when did PR's start going stale ? |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
|
no not stale . and since when did PR's start going stale ? |
WoRA (Weighted-Direction Low-Rank Adaptation) Implementation for PEFT
Summary
This pull request adds support for WoRA (Weighted-Direction Low-Rank Adaptation), a novel extension of DoRA that introduces learnable scalar parameters (alpha and beta) to create a weighted combination of the base weights and LoRA adapters. WoRA provides more fine-grained control over the adaptation process compared to standard LoRA and DoRA.
Fixes : #2861
Analysis and Understanding
WoRA Formula
WoRA extends DoRA by introducing two learnable scalar parameters:
Where:
mis the learned magnitude vector (from DoRA)W₀is the base weight matrixBAis the LoRA decomposition (B × A)α(alpha) controls the LoRA contributionβ(beta) controls the base weight contributionscalingis the LoRA scaling factorKey Insights
LoraVariant Pattern: The existing DoRA implementation uses a clean separation between:
wora.py) that handle forward computationvariants.py) that handle initialization and variant-specific logicParameter Naming Convention: PEFT automatically marks parameters as trainable if their names contain "lora_". This is why we use
lora_wora_alphaandlora_wora_betaParameterDict Storage: Using
nn.ParameterDictensures parameters are:Layer-Specific Challenges:
Implementation Approach
1. Core Architecture (wora.py)
Created four main layer classes:
WoraLinearLayer: Base implementation for linear transformationsWoraEmbeddingLayer: Handles token embeddings with proper matrix transposition_WoraConvNdLayer: Base class for convolutional layersWoraConv1dLayer,WoraConv2dLayer,WoraConv3dLayer: Specialized conv layersKey Design Decisions:
.item()) to avoid affecting the norm computation2. Variant Classes (variants.py)
Implemented five variant classes following PEFT's LoraVariant pattern:
WoraLinearVariantWoraEmbeddingVariantWoraConv1dVariant,WoraConv2dVariant,WoraConv3dVariantEach variant handles:
init(): Creating and initializing WoRA-specific parametersforward(): Calling the appropriate layer forward methodmerge_safe/merge_unsafe(): Merging adapters with base weightsunmerge(): Restoring original weights3. Parameter Initialization (layer.py)
Modified three key methods to initialize WoRA parameters:
LoraLayer.update_layer(): Base implementation for Linear layersEmbedding.update_layer(): Special handling for embedding layers_ConvNd.update_layer(): Handling for convolutional layersInitialization Pattern:
4. Configuration (config.py)
Added
use_woraboolean flag toLoraConfigwith proper validation:Falsefor backward compatibilityTrue5. Testing (test_lora_variants.py)
Added comprehensive tests:
test_variant_is_applied_to_layers: Verifies WoRA variants are correctly applied to all layer typestest_wora_params_have_gradients: Ensures alpha and beta parameters receive gradients during backpropagationKey Technical Challenges and Solutions
Challenge 1: Gradient Flow for Alpha and Beta
Problem: Initial implementation used
.item()to convert Parameters to scalars throughout the computation, breaking gradient flow.Solution:
Challenge 2: Embedding Layer Matrix Dimensions
Problem: Embedding layers store lora_embedding_A and lora_embedding_B with shapes that need transposition before use.
Solution:
lora_embedding_A.Tandlora_embedding_B.TChallenge 3: Parameter Initialization in Override Methods
Problem:
Embeddingand_ConvNdclasses overrideupdate_layer()without callingsuper(), so they missed WoRA parameter initialization.Solution:
requires_grad_(True)to ensure trainabilityChallenge 4: Conv Layer Forward Pass
Problem: Convolutional layers have more complex forward logic with bias handling and reshaping requirements.
Solution:
Verification and Testing
Test Coverage
The implementation includes two parametrized tests that cover:
Variant Application Test: Verifies that:
Gradient Flow Test: Verifies that:
Test Results
All tests pass successfully:

Files Modified
src/peft/tuners/lora/config.py: Addeduse_woraconfiguration parametersrc/peft/tuners/lora/layer.py: Added WoRA parameter initialization in update_layer methodssrc/peft/tuners/lora/wora.py: Implemented WoRA layer classessrc/peft/tuners/lora/variants.py: Implemented WoRA variant classestests/test_lora_variants.py: Added comprehensive WoRA testsBackward Compatibility
This implementation maintains full backward compatibility:
cc: @BenjaminBossan