Skip to content

Conversation

enki
Copy link
Contributor

@enki enki commented Aug 1, 2025

Current Behavior

When multiple Nx processes attempt to read the cached project graph simultaneously, a race
condition can occur that results in the error "[readCachedProjectGraph]: No cached ProjectGraph is available". This
happens because:

  1. Process A starts building the project graph and creates a lock file
  2. Process B tries to read the cached graph while A is still building
  3. Process B finds no valid cached graph (since A hasn't finished) and fails
  4. Subsequent processes cascade into the same failure

The current implementation lacks proper synchronization between processes during graph
building and reading operations.

While hard to reproduce, this has been encountered by multiple users: #31648

Expected Behavior

With this PR, processes now wait for graph building to complete instead of failing immediately:

  1. Process A acquires an exclusive lock and builds the graph (unchanged)
  2. Process B now calls wait_sync() to block until the exclusive lock is released (previously would fail
    immediately)
  3. Process B successfully reads the completed graph after Process A finishes
  4. Subsequent processes either wait (if graph is building) or read immediately (if graph is ready)

The key change is adding synchronous waiting behavior to the existing OS-level file locking. Processes that
previously failed with "No cached ProjectGraph is available" now wait for the graph to be ready, eliminating
the race condition.

Related Issue(s)

Fixes #31648

Implementation Details

This PR is a second attempt after #32162 was rejected based on feedback by @AgentEnder.
This PR's approach properly uses OS-level read/write locks rather than checking for lock file existence.

  • Added wait_sync() method to FileLock that blocks until an exclusive lock is released
  • Uses fs4::fs_std::FileExt::lock_shared() to wait for exclusive locks to complete
  • Integrated into readCachedProjectGraph() to synchronize multi-process access
  • Cross-platform solution using the fs4 crate's file locking primitives

The fix has been validated in production environments where the race condition was
reproducible, and the error no longer occurs with these changes.

enki and others added 10 commits July 30, 2025 13:29
When multiple processes call readCachedProjectGraph() simultaneously
before the cache exists, they would all fail with "No cached ProjectGraph
is available" error. This was common when running parallel builds.

The fix adds synchronous waiting when a lock file exists, indicating
another process is building the graph. Uses Atomics.wait() for efficient
blocking without CPU usage.

- Wait up to 120 seconds for graph to be built
- Check every 200ms for graph availability
- Detect if other process fails (lock removed, no graph)
- Provide helpful timeout message with lock cleanup instructions

Fixes nrwl#31648

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…OOO/nx into fix/project-graph-race-condition-v2
- Add waitSync() method to native FileLock class for synchronous blocking
- Fix race condition where readCachedProjectGraph fails when another process is building
- Properly wait for OS-level file locks instead of checking file existence

Addresses nrwl#31648
Add synchronous lock acquisition to prevent race conditions when multiple processes
attempt to read the cached project graph simultaneously. The sync method uses
blocking I/O to ensure exclusive access during graph deserialization.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@enki enki requested review from a team as code owners August 1, 2025 13:20
@enki enki requested review from leosvelperez and AgentEnder August 1, 2025 13:20
Copy link

vercel bot commented Aug 1, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
nx-dev ✅ Ready (Inspect) Visit Preview Aug 1, 2025 1:31pm

Copy link
Contributor

nx-cloud bot commented Aug 1, 2025

View your CI Pipeline Execution ↗ for commit 2056d5f

Command Status Duration Result
nx affected --targets=lint,test,build,e2e,e2e-c... ✅ Succeeded 56m 23s View ↗
nx run-many -t check-imports check-commit check... ✅ Succeeded 1m 52s View ↗
nx-cloud record -- nx-cloud conformance:check ✅ Succeeded 2s View ↗
nx-cloud record -- nx format:check ✅ Succeeded 6s View ↗
nx-cloud record -- nx sync:check ✅ Succeeded 5s View ↗
nx documentation ✅ Succeeded 5m 29s View ↗

☁️ Nx Cloud last updated this comment at 2025-08-01 14:28:10 UTC

@enki
Copy link
Contributor Author

enki commented Aug 7, 2025

Could we have a review here @AgentEnder @leosvelperez -- this made all my crashes go away and i've stably used it for a week.

@moinerus
Copy link

moinerus commented Aug 12, 2025

Would love this to be looked at, majority of my team are having issues with this on 21.3.5.

@enki
Copy link
Contributor Author

enki commented Aug 12, 2025

JFYI this fix still works solidly on our project. @AgentEnder @leosvelperez

@enki
Copy link
Contributor Author

enki commented Aug 12, 2025

@moinerus can you confirm that this still happens on 21.3.11 (latest release)?

@montella1507
Copy link

Same problem here.

We need to do "rm -rf .angular && rm -rf .nx && rm -rf dist && rm -rf tmp" almost before every build

@moinerus
Copy link

@moinerus can you confirm that this still happens on 21.3.11 (latest release)?

Ok we did the minor bump to 12.3.11 and it looks like the issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[readCachedProjectGraph] ERROR: No cached ProjectGraph is available
3 participants