refactor(query): Refactor virtual columns storage & read planning #19284

b41sh · 2026-01-18T03:08:55Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR focuses on two core refactors:

Shared column consolidation: Sparse virtual columns (presence < 30% of rows) are merged into a single shared map column. This keeps all low‑density JSON fields inside one Variant-backed map, reducing the number of physical columns while still preserving all data.
Virtual columns are now reconstructed via four VirtualColumnReadPlan types, ensuring any JSON shape can be materialized from the virtual dataset:

Direct: read a materialized virtual column by name.
Example: v['id'] exists as a dedicated column.
Shared: read a sparse path from the shared map by key index.
Example: v['user']['extra'] only appears in a few rows, so it is stored in the shared map.
Object: reconstruct a parent object from child plans.
Example: v['user'] is built from v['user']['id'], v['user']['name'], and v['user']['info'].
FromParent: read a variant parent column and extract a suffix with get_by_keypath.
Example: v['user']['info']['tags'][0] is derived from the nearest variant parent (e.g. v['user']['info']) and then extracted by keypath.

Other changes

Read meta directly from virtual parquet: all virtual column metadata is derived from the parquet file, avoiding VirtualBlockMeta mismatches caused by column_id drift.
Reworked binder: missing virtual columns now get temporary column ids when they are synthesized (e.g. object reconstruction, shared-map fields), so planning always has stable IDs.

Notes: This refactor does not attempt to keep compatibility with old virtual-column parquet metadata.

fixes: #[Link the issue here]

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

b41sh · 2026-01-28T08:29:10Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50895a9572

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/query/storages/fuse/src/pruning/virtual_column_pruner.rs

github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jan 18, 2026

b41sh force-pushed the feat-virtual-column-new branch 2 times, most recently from 9cdc421 to 122a5a5 Compare January 27, 2026 06:52

b41sh added 3 commits January 28, 2026 12:09

refactor(query): Virtual column support shared values

becf6d8

add virtual column read plan

30f5065

fix shared columns

50895a9

b41sh force-pushed the feat-virtual-column-new branch from 122a5a5 to 50895a9 Compare January 28, 2026 04:10

b41sh changed the title ~~refactor(query): Virtual column support shared values~~ refactor(query): Refactor virtual columns storage & read planning Jan 28, 2026

b41sh marked this pull request as ready for review January 28, 2026 08:26

b41sh requested a review from sundy-li January 28, 2026 08:27

b41sh requested a review from dantengsky January 28, 2026 08:29

chatgpt-codex-connector bot reviewed Jan 28, 2026

View reviewed changes

src/query/storages/fuse/src/pruning/virtual_column_pruner.rs Show resolved Hide resolved

sundy-li approved these changes Feb 2, 2026

View reviewed changes

Merge branch 'main' into feat-virtual-column-new

6de6b00

b41sh merged commit d90c93c into databendlabs:main Feb 2, 2026
89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(query): Refactor virtual columns storage & read planning #19284

refactor(query): Refactor virtual columns storage & read planning #19284

b41sh commented Jan 18, 2026 •

edited

Loading

Uh oh!

b41sh commented Jan 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(query): Refactor virtual columns storage & read planning #19284

refactor(query): Refactor virtual columns storage & read planning #19284

Conversation

b41sh commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Type of change

Uh oh!

b41sh commented Jan 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b41sh commented Jan 18, 2026 •

edited

Loading