test_runner: add bail out #56490

pmarchini · 2025-01-06T19:15:47Z

Catching up with the last attempt (#48919), this is another try at introducing the bailout feature.
I'm opening this PR as a draft to discuss the implementation and because refactoring may be needed if this approach is well-received by the community.

Note: In some tests, I had to enforce a concurrency=1 setting because testing the bailout feature across multiple files concurrently proved to be extremely flaky.

nodejs-github-bot · 2025-01-06T19:15:54Z

Review requested:

@nodejs/test_runner

lib/internal/test_runner/runner.js

cjihrig · 2025-01-06T19:59:57Z

In some tests, I had to enforce a concurrency=1 setting because testing the bailout feature across multiple files concurrently proved to be extremely flaky.

Can you explain more about what was flaky? I'm guessing you mean the tests were at different points of execution when they received the bail out signal. I think the best way to work around this is to use test fixtures that never finish.

pmarchini · 2025-01-06T20:08:53Z

Can you explain more about what was flaky? I'm guessing you mean the tests were at different points of execution when they received the bail out signal. I think the best way to work around this is to use test fixtures that never finish.

Hey @cjihrig, you're guessing right!

Your proposed solution sounds good.

I think we should add a test as follows:

First file test: A test that fails after a long timeout (maybe 5-10 seconds) to allow other file test processes to be spawned correctly.
Second file test: A test with an infinite loop.

WDYT?

lib/internal/test_runner/runner.js

doc/api/test.md

test/fixtures/test-runner/output/bail-spec.js

test/parallel/test-runner-run.mjs

cjihrig · 2025-01-06T20:52:06Z

I think we should add a test as follows:

First file test: A test that fails after a long timeout (maybe 5-10 seconds) to allow other file test processes to be spawned correctly.

Second file test: A test with an infinite loop.

There are all sorts of annoying edge cases to account for here, and it might be worth doing a survey of how the tap, mocha, and vitest runners handle bailing out when things are running in parallel. It's much more straightforward when only one thing is running. But, for example, if the very first test in the first process fails, do we bother spawning the other child processes at all? Or, in a bail out situation, how important is it to have an accurate summary at the end of the test run with correct counts for total tests, cancelled tests, etc.

pmarchini · 2025-01-12T10:38:57Z

@cjihrig I'm still checking different runners and it seems that mostly the behaviour is "non-standard".
Vitest returns, even after the bail out, the list of all the skipped test files.

Mocha stops the execution returning a partial result without reporting the full list of cancelled tests.

      const result = await runMochaAsync('options/parallel/test-*', [
        '--parallel',
        '--bail'
      ]);
      // we don't know _exactly_ how many tests will be skipped here
      // due to the --bail, but the number of tests completed should be
      // less than the total, which is 5.
      return expect(
        result.passing + result.pending + result.failing,
        'to be less than',
        5
      );

Checking tests like this one I have the impression that the testing of the feature itself follows a "best effort" approach in more than one tool.

While I'm still checking other examples I think that the most common use case for the bailout is to stop as soon as possible the execution in a CI/automation env.
In this scenario, IMHO, all that matters is the "fail fast" and "fail exit".
I'm not even sure that it would make any sense at all to provide a report after a bail and, if a report is being provided, then I prefer the idea of having an output that gives an exact trace of the actual run (so partial).
WDYT?

Regarding the tests I was thinking about:

sequential with single file
sequential isolation none
sequential isolation none with more than 1 test file
parallel with 2 test files ( 1 with some "loading-time" and 1 with infinite waiting loop )
parallel with 2+x test files with concurrency fixed to 2 -> this in order to cover the behaviour of one test file that should not even start its run
order of the hooks while in bail out -> to ensure the hooks are being run as supposed.
order of the hooks while in bail out -> multi file

Do you have any suggestions / other behaviours you think we should ensure?

codecov · 2025-01-12T23:11:14Z

Codecov Report

Attention: Patch coverage is 98.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.21%. Comparing base (3fe8027) to head (4e93338).
Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/internal/test_runner/runner.js	95.23%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #56490      +/-   ##
==========================================
+ Coverage   89.19%   89.21%   +0.01%     
==========================================
  Files         662      662              
  Lines      191893   191984      +91     
  Branches    36937    36964      +27     
==========================================
+ Hits       171164   171280     +116     
+ Misses      13573    13540      -33     
- Partials     7156     7164       +8

Files with missing lines	Coverage Δ
lib/internal/test_runner/harness.js	`92.83% <100.00%> (+0.16%)`	⬆️
lib/internal/test_runner/reporter/spec.js	`96.29% <100.00%> (+0.10%)`	⬆️
lib/internal/test_runner/reporter/tap.js	`95.43% <100.00%> (+0.06%)`	⬆️
lib/internal/test_runner/reporter/utils.js	`96.84% <100.00%> (+0.13%)`	⬆️
lib/internal/test_runner/test.js	`97.11% <100.00%> (+0.04%)`	⬆️
lib/internal/test_runner/tests_stream.js	`92.13% <100.00%> (+0.41%)`	⬆️
lib/internal/test_runner/utils.js	`58.79% <100.00%> (+0.13%)`	⬆️
src/node_options.cc	`87.96% <100.00%> (+0.01%)`	⬆️
src/node_options.h	`98.33% <100.00%> (+<0.01%)`	⬆️
lib/internal/test_runner/runner.js	`90.32% <95.23%> (+1.30%)`	⬆️

... and 41 files with indirect coverage changes

cjihrig

Left some comments. I have some concerns:

The patch is pretty invasive for a feature that is not part of the default behavior. I'm not sure that it needs to be so invasive.
There seems to be a good bit of work being done when it is not necessary. Have you been able to do any performance comparisons with these changes.
The tests (thank you for adding so many tests!) seem to reliant on timers, which usually works fine locally, but is a recipe for flakiness in the CI.

cjihrig · 2025-01-13T15:35:30Z

lib/internal/test_runner/reporter/spec.js

@@ -57,6 +57,7 @@ class SpecReporter extends Transform {
  #handleEvent({ type, data }) {
    switch (type) {
      case 'test:fail':
+        if (data.details?.error?.failureType === kTestBailedOut) break;


I'm not a big fan of needing to check this for every failure, even if bail out mode is not enabled. Couldn't we break out in the test:bail event?

We could, but in that case, we would lose the "report' section.

Couldn't we break out in the test:bail event?

Specifically, how would you do that? Would you finalise the reporter, or propagate the abort up to the test runner root?

@pmarchini we need to also account for the fact these changes impact custom reporters

@atlowChemi, I think the most expressive approach would be to avoid interrupting the test stream (or taking any action that has a similar effect), as users might still want to list the tests that were not executed because of the bail.

@cjihrig regarding avoiding unnecessary controls: I was thinking about using a set of functions "decorating" the test:fail with the additional check only after a test:bail has been received. This would avoid the control in most cases. wdty?

cjihrig · 2025-01-13T15:36:07Z

lib/internal/test_runner/reporter/tap.js

@@ -33,6 +33,7 @@ async function * tapReporter(source) {
  for await (const { type, data } of source) {
    switch (type) {
      case 'test:fail': {
+        if (data.details?.error?.failureType === lazyLoadTest().kTestBailedOut) break;


Same comment here.

Also, why is bailing out only supported in the spec and tap reporters? What about the dot reporter for example?

I haven't implemented all the reporters yet, as I'm still not convinced by the bail implementation itself.
I'll fix this before landing if we're able to reach an agreement on the implementation 🚀

lib/internal/test_runner/runner.js

lib/internal/test_runner/test.js

cjihrig · 2025-01-13T15:53:26Z

lib/internal/test_runner/test.js

@@ -763,6 +770,13 @@ class Test extends AsyncResource {

    this.passed = false;
    this.error = err;
+
+    if (this.bail && !this.root.bailed) {


I'm not a big fan of adding another way to cancel tests when we already have logic for that.

Calling root.postRun was the only way I found to immediately propagate the "stop" to the entire test tree.
I'm currently trying to achieve the same using an AbortController, but I haven't found a viable solution yet.

@cjihrig: All my doubts regarding the implementation revolve around how to interrupt the whole test tree.
I think other suggestions can be addressed, but I also have the impression that this specific part is the main blocker.
IMHO, bail is a missing and relatively important feature, and I’d like to find the best solution to land this.

Do you or other members of @nodejs/test_runner have any suggestions about this?

test/fixtures/test-runner/bailout/parallel-concurrency/first.mjs

test/fixtures/test-runner/bailout/parallel-concurrency/second.mjs

test/fixtures/test-runner/bailout/parallel-loading/infinite-loop.mjs

test/parallel/test-runner-bail.js

pmarchini · 2025-01-13T16:41:56Z

Hey @cjihrig, thanks as always for your review.
I totally understand your concerns.

Step by step:

The patch is pretty invasive for a feature that is not part of the default behavior. I'm not sure that it needs to be so invasive.

Do you have any suggestions on how to reduce the footprint while better integrating it into the current implementation? I'm gaining more confidence in the codebase as I work on it, but I’m sure I might still be missing something important.

There seems to be a good bit of work being done when it is not necessary. Have you been able to do any performance comparisons with these changes?

Not yet, but I’ll make sure to do that as soon as possible!

The tests (thank you for adding so many tests!) seem too reliant on timers, which usually works fine locally, but is a recipe for flakiness in the CI.

I agree. I’ll try to use your suggested approach to coordinate the files without relying on timers or other flaky solutions.

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 6, 2025

pmarchini force-pushed the test_runner/bail-out branch from 89f64db to 55874e8 Compare January 6, 2025 19:18

pmarchini requested review from cjihrig, MoLow, marco-ippolito and jakecastelli January 6, 2025 19:18

atlowChemi reviewed Jan 6, 2025

View reviewed changes

lib/internal/test_runner/runner.js Outdated Show resolved Hide resolved

cjihrig reviewed Jan 6, 2025

View reviewed changes

pmarchini force-pushed the test_runner/bail-out branch from 55874e8 to f1c269c Compare January 10, 2025 17:36

pmarchini marked this pull request as ready for review January 12, 2025 21:51

cjihrig reviewed Jan 13, 2025

View reviewed changes

pmarchini added 11 commits January 18, 2025 14:30

test_runner: add bail out

28c4e69

test_runner: replace process.kill with harness abort controller

0b78797

test_runner: integrate abort signal handling in test tree creation

ba4b376

test_runner: add bail out tests

5ec7d65

test_runner: update bail-out documentation

6eed90f

test: lint fixtures

24d8a0d

test: fix assertion

10985fa

test_runner: add bailout tests for long-running and multi-file cases

ae40aca

test: remove comments

4824fbf

test_runner: remove unused test fixture references

24eb106

test_runner: remove temporary no-op for bailed tests

439a5ee

pmarchini added 2 commits January 18, 2025 14:30

test: update fixture

26ebbb2

test_runner: avoid unnecessary error creation

791fb5e

pmarchini force-pushed the test_runner/bail-out branch from db86232 to 791fb5e Compare January 19, 2025 09:37

pmarchini added 3 commits January 19, 2025 19:45

test: remove duplicated tests

1183d02

test_runner: avoid validating options.bail if not provided

524ffe8

test: prevent flaky behavior in test runner bail option

a32f885

pmarchini force-pushed the test_runner/bail-out branch from 8833b0c to a32f885 Compare January 19, 2025 21:40

test: use sync file in test-runner-output

4e93338

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_runner: add bail out #56490

test_runner: add bail out #56490

pmarchini commented Jan 6, 2025

nodejs-github-bot commented Jan 6, 2025

cjihrig commented Jan 6, 2025

pmarchini commented Jan 6, 2025

cjihrig commented Jan 6, 2025

pmarchini commented Jan 12, 2025 •

edited

Loading

codecov bot commented Jan 12, 2025 •

edited

Loading

cjihrig left a comment

cjihrig Jan 13, 2025

pmarchini Jan 14, 2025

atlowChemi Jan 14, 2025

pmarchini Jan 19, 2025

cjihrig Jan 13, 2025

pmarchini Jan 14, 2025

cjihrig Jan 13, 2025

pmarchini Jan 14, 2025

pmarchini Jan 19, 2025

pmarchini commented Jan 13, 2025

test_runner: add bail out #56490

Are you sure you want to change the base?

test_runner: add bail out #56490

Conversation

pmarchini commented Jan 6, 2025

nodejs-github-bot commented Jan 6, 2025

cjihrig commented Jan 6, 2025

pmarchini commented Jan 6, 2025

cjihrig commented Jan 6, 2025

pmarchini commented Jan 12, 2025 • edited Loading

codecov bot commented Jan 12, 2025 • edited Loading

Codecov Report

cjihrig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmarchini commented Jan 13, 2025

pmarchini commented Jan 12, 2025 •

edited

Loading

codecov bot commented Jan 12, 2025 •

edited

Loading