Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_changelog_view returns no record when end-timestamp is missing #11922

Open
1 of 3 tasks
vinitamaloo-asu opened this issue Jan 7, 2025 · 1 comment
Open
1 of 3 tasks
Labels
bug Something isn't working

Comments

@vinitamaloo-asu
Copy link

vinitamaloo-asu commented Jan 7, 2025

Apache Iceberg version

1.7.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Description
Create_changelog_view does not return any records when end-timestamp is missing and parent_id is null.

Steps to reproduce
1. Add two records in the table one after the other.
2. Run create_changelog_view with start-timestamp and end-timestamp
3. Run create_changelog_view with start-timestamp

Observations
Snapshots
Screenshot 2025-01-07 at 1 46 45 PM
+---------------------------+---------------------------+--------------------------+
| committed_at | snapshot_id | parent_id |
+---------------------------+---------------------------+--------------------------+
|2025-01-07 13:40:34.662 | 3486607060728746628 | NULL |
|2025-01-07 13:40:37.126 | 1370475982752787916 | 3486607060728746628 |
+---------------------------+---------------------------+--------------------------+

Run create_changelog_view with start-timestamp and end-timestamp
Call iceberg.system.create_changelog_view( compute_updates => true, table => 'iceberg.db.table_1', options => map('start-timestamp', '1736286033773', 'end-timestamp', '1736286038217'), identifier_columns => array('foo') )
Result
Returns both the records.

Run create_changelog_view with start-timestamp
Call iceberg.system.create_changelog_view( compute_updates => true, table => 'iceberg.db.table_1', options => map('start-timestamp', '1736286033773'), identifier_columns => array('foo') )
Result
Returns both the records.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@vinitamaloo-asu vinitamaloo-asu added the bug Something isn't working label Jan 7, 2025
@flyrain
Copy link
Contributor

flyrain commented Jan 7, 2025

I suspect this line,

, it returns a null in case that a snapshot has no parent. Then it comes to this line,
if (startSnapshotId == null && endTimestamp == null) {
in which case, it yields different results based on whether the end timestamp is given.

The fix would be simple, we can remove following lines, and check on whether end snapshot id is null later after getEndSnapshotId(endTimestamp).

      if (startSnapshotId == null && endTimestamp == null) {
        emptyScan = true;
      }

I also suggest to add a unit test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants