Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial gRPC spec changes for supporting index creation and querying #8829

Merged
merged 12 commits into from
Jan 30, 2025

Conversation

zehiko
Copy link
Contributor

@zehiko zehiko commented Jan 27, 2025

What

Expose creating and querying an index over an entire collection of data.

We add support for:

  • vector index
  • inverted index for full text search
  • btree index

Query data is always represented with a record batch with just 1 column that carries the query data.

@zehiko zehiko added exclude from changelog PRs with this won't show up in CHANGELOG.md remote-store remote store gRPC API labels Jan 27, 2025
@zehiko zehiko requested a review from jleibs January 27, 2025 15:39
@zehiko zehiko self-assigned this Jan 27, 2025
@zehiko zehiko marked this pull request as draft January 27, 2025 15:39
Copy link

github-actions bot commented Jan 27, 2025

Latest documentation preview deployed successfully.

Result Commit Link
8ab34a8 https://landing-jidb1e95s-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

Copy link

github-actions bot commented Jan 27, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
744b69c https://rerun.io/viewer/pr/8829 +nightly +main

Note: This comment is updated whenever you push a commit.

@@ -9,6 +9,16 @@ service StorageNode {
rpc Query(QueryRequest) returns (stream DataframePart) {}
rpc FetchRecording(FetchRecordingRequest) returns (stream rerun.common.v0.RerunChunk) {}

rpc IndexCollection(IndexCollectionRequest) returns (IndexCollectionResponse) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now everything lives under one service StorageNode. I wonder if it is cleaner to create several services that groups similar calls logically? Maybe something like CatalogService, IndexService, ...? It might be nice in the future to decide on a fine granular level which services should spin up, for example if we want to distribute loads across multiple VMs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is still a tight relation between catalog, collection, collection query path, collection index query path. This definitely requires bit of thinking how we split it. I'll create an issue, but I won't tackle this as part of this story.

@@ -9,6 +9,16 @@ service StorageNode {
rpc Query(QueryRequest) returns (stream DataframePart) {}
rpc FetchRecording(FetchRecordingRequest) returns (stream rerun.common.v0.RerunChunk) {}

rpc IndexCollection(IndexCollectionRequest) returns (IndexCollectionResponse) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to CreateIndex, we probably want an API for ReIndex that only requires the collection + ColumnDescriptor but doesn't need the parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking which recordings are in the index so we know whether reindexing is necessary would be nice to think about, but not necessary yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/rerun-io/dataplatform/issues/156

This will be tackled in a first follow up.

@zehiko zehiko changed the title Draft: initial gRPC spec proposal for index creation and querying initial gRPC spec changes for supporting index creation and querying Jan 29, 2025
@zehiko zehiko marked this pull request as ready for review January 29, 2025 17:01
@zehiko zehiko merged commit ec2adb6 into main Jan 30, 2025
31 checks passed
@zehiko zehiko deleted the zehiko/indexing branch January 30, 2025 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude from changelog PRs with this won't show up in CHANGELOG.md remote-store remote store gRPC API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants