Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e tests: limit name_len_slow to 3, split e2e tests from others #842

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

skshetry
Copy link
Member

The test_query_e2e takes almost ~8mins to run (whole CI job takes 11 mins). The name_len_slow script is the main culprit, since it sleeps for 1 sec in each udf function and that mapper is run in a single process parallel mode.

474.21s call     tests/test_query_e2e.py::test_query_e2e@tmpfile

This commit adds a limit of 3 files to the name_len_slow script, which is enough, since it's only running a single process.
(We immediately interrupt the running process after seeing "UDF Processing Started" gets printed).

This also split tests into two: one for the e2e tests and one for the rest, so that these things are more obvious in the future.

Copy link

cloudflare-workers-and-pages bot commented Jan 21, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: c019173
Status: ✅  Deploy successful!
Preview URL: https://0c5f2ae9.datachain-documentation.pages.dev
Branch Preview URL: https://name-len-slow.datachain-documentation.pages.dev

View logs

@skshetry skshetry changed the title e2e tests: limit name_len_slow to 3, split e2e tests from other tests e2e tests: limit name_len_slow to 3, split e2e tests from others Jan 21, 2025
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.42%. Comparing base (1b5a585) to head (c019173).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #842   +/-   ##
=======================================
  Coverage   87.42%   87.42%           
=======================================
  Files         128      128           
  Lines       11429    11429           
  Branches     1559     1559           
=======================================
  Hits         9992     9992           
  Misses       1042     1042           
  Partials      395      395           
Flag Coverage Δ
datachain 87.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

[The `test_query_e2e` takes almost ~8mins to run][1] (whole CI job takes
11 mins). The `name_len_slow` script is the main culprit, since it
sleeps for 1 sec in each udf function and that mapper is run in a single
process parallel mode.

```
474.21s call     tests/test_query_e2e.py::test_query_e2e@tmpfile
```

This commit adds a limit of 3 files to the name_len_slow script, which is
enough, since it's only running a single process.
(We immediately interrupt the running process after seeing "UDF
Processing Started" gets printed).

This also split tests into two: one for the e2e tests and one for the
rest, so that these things are more obvious in the future.

[1]: https://github.com/iterative/datachain/actions/runs/12879531971/job/35907168617#step:8:82
shell: bash

- name: Run E2E tests
run: nox -s tests-${{ matrix.pyv }} -- -m "e2e" $DISABLE_REMOTES_ARG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Do we get some benefit from splitting this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the tests be distributed amongst different workers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Do we get some benefit from splitting this out?

I hope that this will make these issues obvious by splitting them.

Will the tests be distributed amongst different workers?

Yes, it runs in the same way as the above test run. PTAL at https://github.com/iterative/datachain/actions/runs/12880225335/job/35909694647#step:9:17 where this workflow is actually run (we use pull_request_target which is running workflow from the default branch).

@skshetry skshetry merged commit 7d0913e into main Jan 21, 2025
65 checks passed
@skshetry skshetry deleted the name-len-slow branch January 21, 2025 06:30
@mattseddon
Copy link
Member

did this change break the coverage reports?

dreadatour pushed a commit that referenced this pull request Jan 27, 2025
…#842)

[The `test_query_e2e` takes almost ~8mins to run][1] (whole CI job takes
11 mins). The `name_len_slow` script is the main culprit, since it
sleeps for 1 sec in each udf function and that mapper is run in a single
process parallel mode.

```
474.21s call     tests/test_query_e2e.py::test_query_e2e@tmpfile
```

This commit adds a limit of 3 files to the name_len_slow script, which is
enough, since it's only running a single process.
(We immediately interrupt the running process after seeing "UDF
Processing Started" gets printed).

This also split tests into two: one for the e2e tests and one for the
rest, so that these things are more obvious in the future.

[1]: https://github.com/iterative/datachain/actions/runs/12879531971/job/35907168617#step:8:82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants