-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Added changes to populate file handle stats in Cluster Info #20260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds per-node ProcessStats collection and exposure: marks ProcessStats as PublicApi, extends ClusterInfo to carry nodeProcessStats, updates InternalClusterInfoService to request and store process stats, and updates tests and mocks to handle the new ClusterInfo field and constructor arity. Changes
Sequence Diagram(s)sequenceDiagram
participant Node
participant InternalClusterInfoService as InfoService
participant NodesStatsAPI as NodesStats
participant ClusterState as ClusterInfoStore
rect rgba(120,180,240,0.08)
Note over InfoService,NodesStats: periodic refresh() cycle
InfoService->>NodesStats: send NodesStatsRequest(metrics: PROCESS, FS, OS, ... )
NodesStats-->>InfoService: NodeStats per node (includes ProcessStats)
end
rect rgba(160,220,160,0.06)
Note over InfoService,ClusterInfoStore: assemble ClusterInfo and publish
InfoService->>InfoService: extract NodeStats.getProcess() -> nodeProcessStats map
InfoService->>ClusterInfoStore: update ClusterInfo(nodeProcessStats=...)
ClusterInfoStore-->>InfoService: ack / cluster state updated
end
rect rgba(240,200,120,0.06)
Note over InfoService,InfoService: on node removal or failure
Node-->>InfoService: node removed
InfoService->>InfoService: prune nodeProcessStats entry
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/cluster/ClusterInfo.java (1)
100-111: Deprecated constructor discards thenodeProcessStatsparameter.The constructor accepts
nodeProcessStatsas a parameter (line 107) but the delegation call on line 110 ignores it and passesMap.of()instead:this(..., nodeFileCacheStats, Map.of(), Map.of()); // ^^^^^^^^ ^^^^^^^^ // nodeResourceUsageStats nodeProcessStats (ignored!)This will cause silent data loss for any caller using this deprecated constructor.
@Deprecated(forRemoval = true) public ClusterInfo( final Map<String, DiskUsage> leastAvailableSpaceUsage, final Map<String, DiskUsage> mostAvailableSpaceUsage, final Map<String, Long> shardSizes, final Map<ShardRouting, String> routingToDataPath, final Map<NodeAndPath, ReservedSpace> reservedSpace, final Map<String, AggregateFileCacheStats> nodeFileCacheStats, final Map<String, ProcessStats> nodeProcessStats ) { - this(leastAvailableSpaceUsage, mostAvailableSpaceUsage, shardSizes, routingToDataPath, reservedSpace, nodeFileCacheStats,Map.of(), Map.of()); + this(leastAvailableSpaceUsage, mostAvailableSpaceUsage, shardSizes, routingToDataPath, reservedSpace, nodeFileCacheStats, Map.of(), nodeProcessStats); }
🧹 Nitpick comments (5)
server/src/internalClusterTest/java/org/opensearch/cluster/ClusterInfoServiceIT.java (1)
58-59: New process-stats IT covers end-to-end collection, with one portability caveatThe new integration test correctly exercises
InternalClusterInfoService’snodeProcessStatscollection and asserts a healthy shape (two entries, positive timestamp, CPU/mem non-null). One thing to double-check: on any supported platforms whereProcessStatsmight report “unknown” values (e.g.,-1for file descriptor counts), the>= 0/> 0assertions could become brittle. If such platforms are in scope for your CI, consider relaxing those checks or explicitly documenting the expected semantics.Also applies to: 256-281
server/src/test/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitorTests.java (1)
1402-1402: Minor formatting inconsistency: extra space before comma.There's an extra space before the comma:
Map.of() , Map.of(). This should beMap.of(), Map.of()for consistency.- return new ClusterInfo(diskUsages, null, shardSizes, null, reservedSpace, fileCacheStats, Map.of() , Map.of()); + return new ClusterInfo(diskUsages, null, shardSizes, null, reservedSpace, fileCacheStats, Map.of(), Map.of());server/src/main/java/org/opensearch/cluster/ClusterInfo.java (1)
77-79: Remove redundant blank lines.There are multiple consecutive blank lines that should be reduced to a single blank line for consistency with the rest of the codebase.
private final Map<String, ProcessStats> nodeProcessStats; - - - + private long avgTotalBytes;server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java (2)
120-121: Minor formatting: remove trailing space before semicolon.There's an extra space before the semicolon on line 120.
- private volatile Map<String, ProcessStats> nodeProcessStats ; - + private volatile Map<String, ProcessStats> nodeProcessStats;
329-329: Minor formatting: trailing whitespace.Remove trailing spaces after the statement on line 329.
- nodeProcessStats = Map.of(); + nodeProcessStats = Map.of();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
server/src/internalClusterTest/java/org/opensearch/cluster/ClusterInfoServiceIT.java(4 hunks)server/src/main/java/org/opensearch/cluster/ClusterInfo.java(11 hunks)server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java(8 hunks)server/src/main/java/org/opensearch/monitor/process/ProcessStats.java(4 hunks)server/src/test/java/org/opensearch/cluster/ClusterInfoTests.java(6 hunks)server/src/test/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitorTests.java(1 hunks)server/src/test/java/org/opensearch/cluster/routing/allocation/IndexShardConstraintDeciderOverlapTests.java(1 hunks)server/src/test/java/org/opensearch/cluster/routing/allocation/RemoteShardsBalancerBaseTestCase.java(1 hunks)server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderTests.java(1 hunks)server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java(3 hunks)server/src/test/java/org/opensearch/indices/tiering/TieringRequestValidatorTests.java(4 hunks)server/src/test/java/org/opensearch/snapshots/SnapshotResiliencyTests.java(1 hunks)test/framework/src/main/java/org/opensearch/cluster/MockInternalClusterInfoService.java(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-12T13:31:41.214Z
Learnt from: andriyredko
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:642-646
Timestamp: 2025-12-12T13:31:41.214Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), the createBuilderWithPort() helper intentionally uses randomBoolean() for the HTTP/3 enabled setting to ensure all tests validate correct behavior with both HTTP/3 enabled and disabled configurations.
Applied to files:
server/src/test/java/org/opensearch/cluster/ClusterInfoTests.java
🧬 Code graph analysis (2)
server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java (1)
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java (1)
NodeStats(81-658)
server/src/main/java/org/opensearch/monitor/process/ProcessStats.java (1)
server/src/main/java/org/opensearch/cluster/ClusterInfo.java (2)
PublicApi(66-505)PublicApi(415-503)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (15)
server/src/test/java/org/opensearch/snapshots/SnapshotResiliencyTests.java (1)
466-480: ClusterInfo ctor update correctly supplies empty nodeProcessStatsThe extended
ClusterInfoconstruction (adding the finalMap.of()) preserves argument ordering and types and cleanly initializesnodeProcessStatsas empty, which is appropriate for this test.server/src/test/java/org/opensearch/cluster/routing/allocation/RemoteShardsBalancerBaseTestCase.java (1)
286-295: DevNullClusterInfo now correctly initializes nodeProcessStatsPassing
Map.of()as the final super-argument cleanly initializesnodeProcessStatswhile preserving existing behavior for disk usage, shard sizes, and node resource stats.server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java (1)
128-139: All ClusterInfo call sites updated consistently for nodeProcessStatsEach
ClusterInfoconstruction here now passes an 8thMap.of()argument fornodeProcessStats, with prior parameters (leastAvailableUsages,mostAvailableUsage,shardSizes, optionalroutingToDataPath,reservedSpace,nodeFileCacheStats,nodeResourceUsageStats) remaining in the correct order.Also applies to: 215-224, 341-350
test/framework/src/main/java/org/opensearch/cluster/MockInternalClusterInfoService.java (1)
159-171: SizeFakingClusterInfo correctly forwards new ClusterInfo fieldsForwarding both
delegate.getNodeResourceUsageStats()anddelegate.getNodeProcessStats()into the extendedClusterInfoconstructor keeps the mock wrapper behavior aligned with the realClusterInfostructure.server/src/test/java/org/opensearch/cluster/routing/allocation/IndexShardConstraintDeciderOverlapTests.java (1)
173-182: DevNullClusterInfo wired up with empty nodeProcessStatsThe updated
super(...)call correctly passesnodeResourceUsageStatsas the 7th argument and an emptyMapasnodeProcessStats, maintaining previous semantics while satisfying the new constructor.server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderTests.java (1)
1586-1603: DevNullClusterInfo constructor chain handles nodeProcessStats correctlyThe root
DevNullClusterInfoconstructor now passesnodeResourceUsagesand an emptynodeProcessStatsMapintoClusterInfo, and the shorter overloads delegate to it without behavior changes. This keeps all existing tests valid under the extendedClusterInfoAPI.server/src/internalClusterTest/java/org/opensearch/cluster/ClusterInfoServiceIT.java (1)
397-414: Node process stats cleared and recovered in error pathExtending the clear-on-error test to assert
getNodeProcessStats().size()is0on failure and2after recovery keeps the new metric’s lifecycle aligned with disk, shard, and resource stats.server/src/test/java/org/opensearch/cluster/ClusterInfoTests.java (2)
57-81: Good test coverage for serialization.The serialization test properly includes randomized
ProcessStatsand verifies the data survives the round-trip. The size comparison at line 80 is appropriate sinceProcessStatsdoesn't implementequals().
166-183: Test helper correctly generates randomized ProcessStats.The helper method properly creates
ProcessStatsinstances with randomized values for CPU percent, CPU total time, memory, and file descriptors. This provides good coverage for the serialization tests.server/src/test/java/org/opensearch/indices/tiering/TieringRequestValidatorTests.java (1)
174-174: Constructor updates align with new ClusterInfo signature.All
ClusterInfoconstructor invocations correctly add the emptyMap.of()for the newnodeProcessStatsparameter. These tests don't require process stats data, so empty maps are appropriate.Also applies to: 188-188, 205-205, 308-308
server/src/main/java/org/opensearch/monitor/process/ProcessStats.java (1)
145-148: Consistent public API exposure for inner classes.The
MemandCpuinner classes are appropriately annotated with the same@PublicApiversion as the parentProcessStatsclass. This ensures consumers can rely on these types as part of the stable API surface.Also applies to: 173-176
server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java (4)
209-213: LGTM!The cleanup logic for
nodeProcessStatscorrectly follows the same pattern used for other node-related maps, ensuring removed node entries are properly pruned.
226-237: LGTM!The
getClusterInfo()method correctly passes the newnodeProcessStatsmap to theClusterInfoconstructor, consistent with the existing pattern for other node statistics.
244-255: LGTM!Adding the
PROCESSmetric to theNodesStatsRequestis the correct way to fetch process statistics from nodes.
301-306: LGTM!The null-check filter for
nodeStats.getProcess()correctly handles the nullableProcessStatsfield documented inNodeStats. The stream-based collection follows the same pattern used elsewhere in this method.
|
❌ Gradle check result for c8b279f: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/monitor/process/ProcessStats.java
Outdated
Show resolved
Hide resolved
9e7a6b4 to
c8b279f
Compare
Signed-off-by: tanyabti <tanyabti@amazon.com>
Signed-off-by: tanyabti <tanyabti@amazon.com>
b2c319e to
9ddf2c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/cluster/ClusterInfo.java (1)
96-117: Deprecated ClusterInfo constructor silently discards nodeProcessStatsThe deprecated constructor accepts
nodeProcessStatsbut then ignores it when delegating:this( leastAvailableSpaceUsage, mostAvailableSpaceUsage, shardSizes, routingToDataPath, reservedSpace, nodeFileCacheStats, Map.of(), // nodeResourceUsageStats Map.of() // nodeProcessStats (should forward param) );Any callers still using this ctor will lose process stats unexpectedly, which is surprising even for deprecated API and can break consumers relying on it.
Consider forwarding the argument while still defaulting
nodeResourceUsageStatsto an empty map, e.g.:this( leastAvailableSpaceUsage, mostAvailableSpaceUsage, shardSizes, routingToDataPath, reservedSpace, nodeFileCacheStats, Map.of(), nodeProcessStats );
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/cluster/InternalClusterInfoService.java (1)
116-121: nodeProcessStats lifecycle in InternalClusterInfoService is wired correctly
- Metric is requested in
updateNodeStats, populated fromNodeStats.getProcess()inrefresh(), included ingetClusterInfo(), pruned on data‑node removal, and cleared on failure alongside other maps. This mirrors existing patterns for disk and resource usage stats.- Minor optional: you call
adjustNodesStats(nodesStatsResponse.getNodes())three times inonResponse; caching that result in a local variable would avoid redundant work, though it’s unlikely to be noticeable here.Also applies to: 131-137, 191-214, 225-237, 245-252, 301-305, 329-330
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
server/src/internalClusterTest/java/org/opensearch/cluster/ClusterInfoServiceIT.javaserver/src/main/java/org/opensearch/cluster/ClusterInfo.javaserver/src/main/java/org/opensearch/cluster/InternalClusterInfoService.javaserver/src/main/java/org/opensearch/monitor/process/ProcessStats.javaserver/src/test/java/org/opensearch/cluster/ClusterInfoTests.javaserver/src/test/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitorTests.javaserver/src/test/java/org/opensearch/cluster/routing/allocation/IndexShardConstraintDeciderOverlapTests.javaserver/src/test/java/org/opensearch/cluster/routing/allocation/RemoteShardsBalancerBaseTestCase.javaserver/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderTests.javaserver/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.javaserver/src/test/java/org/opensearch/indices/tiering/TieringRequestValidatorTests.javaserver/src/test/java/org/opensearch/snapshots/SnapshotResiliencyTests.javatest/framework/src/main/java/org/opensearch/cluster/MockInternalClusterInfoService.java
🚧 Files skipped from review as they are similar to previous changes (6)
- server/src/test/java/org/opensearch/cluster/routing/allocation/IndexShardConstraintDeciderOverlapTests.java
- server/src/test/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitorTests.java
- server/src/test/java/org/opensearch/cluster/ClusterInfoTests.java
- server/src/test/java/org/opensearch/cluster/routing/allocation/RemoteShardsBalancerBaseTestCase.java
- test/framework/src/main/java/org/opensearch/cluster/MockInternalClusterInfoService.java
- server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderTests.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/main/java/org/opensearch/monitor/process/ProcessStats.java (1)
server/src/main/java/org/opensearch/cluster/ClusterInfo.java (2)
PublicApi(65-511)PublicApi(421-509)
🔇 Additional comments (6)
server/src/test/java/org/opensearch/snapshots/SnapshotResiliencyTests.java (1)
466-481: ClusterInfo constructor arity update looks correctThe new 8‑arg
ClusterInfoinvocation passesnodeFileCacheStatsplus empty maps for the new resource/process stats parameters in the correct order; behavior of this test remains unchanged and focused on file cache stats.server/src/test/java/org/opensearch/cluster/routing/allocation/decider/DiskThresholdDeciderUnitTests.java (1)
128-139: ClusterInfo constructor extensions wired correctly in disk threshold testsAll three updated
ClusterInfoconstructions now supply the additionalnodeResourceUsageStatsandnodeProcessStatsmap arguments as empty maps, preserving the previous behavior while matching the new 8‑parameter signature.Also applies to: 215-224, 341-350
server/src/main/java/org/opensearch/cluster/ClusterInfo.java (1)
71-82: nodeProcessStats integration (field, ctor, IO, XContent, accessor) looks consistent
- New
nodeProcessStatsfield is threaded through the primary constructor and default ctor correctly.- Serialization/deserialization use the same
Version.V_3_4_0gate on both read and write paths.- XContent adds a dedicated
node_process_statsobject per node and delegates toProcessStats.toXContent, matching the pattern used fornode_resource_usage_stats.getNodeProcessStats()returns an unmodifiable view, consistent with other getters.Aside from the deprecated‑ctor forwarding noted separately, the rest of the wiring for process stats in
ClusterInfolooks solid.Also applies to: 129-149, 171-183, 217-233, 255-267, 332-337
server/src/internalClusterTest/java/org/opensearch/cluster/ClusterInfoServiceIT.java (1)
58-59: ProcessStats collection IT and error‑path assertions are well‑scopedThe new integration test validates per‑node
ProcessStatsshape and count, and the extended clear‑on‑error test correctly asserts thatnodeProcessStatsis emptied on failure and repopulated after recovery. This reuses the same “2 data nodes” assumption as the existing disk/resource tests, so behavior stays consistent across metrics.Also applies to: 256-281, 401-414
server/src/main/java/org/opensearch/monitor/process/ProcessStats.java (1)
35-36: PublicApi annotations for ProcessStats and inner classes are appropriateMarking
ProcessStats,Mem, andCpuas@PublicApi(since = "3.4.0")and updating the Javadoc tags aligns with their new exposure throughClusterInfo.getNodeProcessStats()and the existing PublicApi pattern in core classes.Also applies to: 46-52, 142-148, 170-176
server/src/test/java/org/opensearch/indices/tiering/TieringRequestValidatorTests.java (1)
171-176: Tiering tests correctly adapted to extended ClusterInfo constructorAll updated
ClusterInfoconstructions now supply empty maps for file‑cache, resource‑usage, and process stats in the correct parameter order, keeping the tests focused on disk/shard sizing behavior without altering semantics.Also applies to: 185-191, 202-206, 306-309
|
❌ Gradle check result for 9ddf2c8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20260 +/- ##
============================================
- Coverage 73.28% 73.19% -0.09%
+ Complexity 71723 71687 -36
============================================
Files 5785 5785
Lines 328137 328163 +26
Branches 47270 47275 +5
============================================
- Hits 240465 240203 -262
- Misses 68397 68697 +300
+ Partials 19275 19263 -12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Coverage is still low. Please fix. |
|
❌ Gradle check result for c792e4b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/allocation/DiskThresholdMonitor.java
Outdated
Show resolved
Hide resolved
|
❌ Gradle check result for dd8e10c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
dd8e10c to
86163bf
Compare
|
❌ Gradle check result for 3141550: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Added process statistics to ClusterInfo to enable cluster-wide visibility into node-level metrics including open file descriptors, max file descriptors, CPU, and memory.
Related Issues
Resolves #20111
Check List
Summary by CodeRabbit
New Features
API Updates
Tests
✏️ Tip: You can customize this high-level summary in your review settings.