Skip to content

Conversation

@b41sh
Copy link
Member

@b41sh b41sh commented Jan 18, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR focuses on two core refactors:

  1. Shared column consolidation: Sparse virtual columns (presence < 30% of rows) are merged into a single shared map column. This keeps all low‑density JSON fields inside one Variant-backed map, reducing the number of physical columns while still preserving all data.
  2. Virtual columns are now reconstructed via four VirtualColumnReadPlan types, ensuring any JSON shape can be materialized from the virtual dataset:
  • Direct: read a materialized virtual column by name.
    Example: v['id'] exists as a dedicated column.
  • Shared: read a sparse path from the shared map by key index.
    Example: v['user']['extra'] only appears in a few rows, so it is stored in the shared map.
  • Object: reconstruct a parent object from child plans.
    Example: v['user'] is built from v['user']['id'], v['user']['name'], and v['user']['info'].
  • FromParent: read a variant parent column and extract a suffix with get_by_keypath.
    Example: v['user']['info']['tags'][0] is derived from the nearest variant parent (e.g. v['user']['info']) and then extracted by keypath.

Other changes

  • Read meta directly from virtual parquet: all virtual column metadata is derived from the parquet file, avoiding VirtualBlockMeta mismatches caused by column_id drift.
  • Reworked binder: missing virtual columns now get temporary column ids when they are synthesized (e.g. object reconstruction, shared-map fields), so planning always has stable IDs.

Notes: This refactor does not attempt to keep compatibility with old virtual-column parquet metadata.

  • fixes: #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Jan 18, 2026
@b41sh b41sh force-pushed the feat-virtual-column-new branch 2 times, most recently from 9cdc421 to 122a5a5 Compare January 27, 2026 06:52
@b41sh b41sh force-pushed the feat-virtual-column-new branch from 122a5a5 to 50895a9 Compare January 28, 2026 04:10
@b41sh b41sh changed the title refactor(query): Virtual column support shared values refactor(query): Refactor virtual columns storage & read planning Jan 28, 2026
@b41sh b41sh marked this pull request as ready for review January 28, 2026 08:26
@b41sh b41sh requested a review from sundy-li January 28, 2026 08:27
@b41sh
Copy link
Member Author

b41sh commented Jan 28, 2026

@codex review

@b41sh b41sh requested a review from dantengsky January 28, 2026 08:29
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50895a9572

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@b41sh b41sh merged commit d90c93c into databendlabs:main Feb 2, 2026
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-refactor this PR changes the code base without new features or bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants