Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

Open
kornelski opened this issue Jan 9, 2025 · 2 comments
Labels
A-build-scripts Area: build.rs scripts C-bug Category: bug S-triage Status: This issue is waiting on initial triage.

Comments

@kornelski
Copy link
Contributor

kornelski commented Jan 9, 2025

Problem

The current implementation of cargo::error has a minimal, concise error presentation only when the build script exits with a successful status code (0) when it reports a build failure.

If a build script fails with a non-zero status code, use of the cargo::error directive doesn't prevent Cargo from adding a full dump of stdout/stderr after it, which makes the output noisy and difficult to understand #10159.

The current implementation incentivises build scripts to exit with the status 0 (success) to cleanly report fatal build errors. This is a non-standard behavior for reporting failures, and it's not backwards compatible with older Cargo versions that don't understand cargo::error.

Dumping the full stdout/stderr output is a major problem that diminishes usefulness of cargo::error. The dump is so noisy that the errors reported via cargo::error may be less visible and less readable than a raw text printed to stderr. The stdout dump can contain long lists of cargo::rerun-if-(env-)changed directives, and other potentially irrelevant logs from subcommands. A long noisy stdout/stderr dump usually causes users to stop reading the output carefully, and gloss over the entire output instead. It may even push the "error:" lines out of the view, leaving only the raw dump on screen, which ends with less helpful content. The pkg-config and cmake crates have this problem, and they also happen to be the most frequent source of build script failures. pkg-config prints 21 cargo::rerun-if-env-changed lines, which together with Cargo's boilerplate is 25 lines, the same as the default height of cmd.exe console on Windows. Windows users may literally not see the errors generated by cargo::error if a build script fails with a failure status code. The long output dump can be made even noisier when the build script panics, and Cargo actively encourages users to enable even longer backtraces, which pushes the "error:" lines even further, and obscures even the last lines of stderr that may contain another copy of the error.

I think the conditions for when cargo::error= has a clean presentation should be swapped: it should give the cleanest presentation when the build script reports cargo::error and exits with a failure status code to match it. This is a consistent response.

When a build script prints cargo::error, but exits with a success code, this is a self-contradictory result. This is likely a bug in the build script. Such failures could be caused by helper functions or build-time dependencies eagerly printing cargo::error, even when the build script manages to recover from the error and succeeds using some other way. To help catch such rogue cargo::errors that break succesful build scripts, Cargo should be printing the full stdout dump, so that build script authors can check where in the log the unintended directive has appeared.

In other words, this is the current behavior:

no structured errors cargo::error=
exit code: success success (clean output) error (terse output)
exit code: failure dump all output error + dump all output

but to make cargo::error useful to end users, and bad error reporting debuggable, it should work like this:

no structured errors cargo::error=
exit code: success success (clean output) error + dump all output
exit code: failure dump all output error (clean output)

Steps

Buggy build.rs:

use std::process::ExitCode;

fn main() -> ExitCode {
    println!("cargo::error=oops");
    ExitCode::SUCCESS
}

happens to get a clean error output with a high signal-to-noise ratio:

error: [email protected]: oops
error: build script logged errors

A more typical script that exits with a correct status code:

fn main() {
    println!("cargo::error=oops");
    panic!("oops")
}

gets an ugly noisy dump that makes it hard for end users to understand what's going on:

error: [email protected]: oops
error: failed to run custom build command for `buildexample v0.1.0 (/path/to/buildexample)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/path/to/buildexample/target/debug/build/buildexample-0e711ca19eba66e9/build-script-build` (exit status: 101)
  --- stdout
  cargo::error=oops

  --- stderr
  thread 'main' panicked at build.rs:3:5:
  oops
  stack backtrace:
     0: rust_begin_unwind
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
     1: core::panicking::panic_fmt
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
     2: build_script_build::main
     3: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Possible Solution(s)

build scripts that report both failure and success at the same time (one via cargo:: the other via status) should get the stdout dump to help find the source of the contradictory response.

build scripts that print cargo::error= and correctly return failure code should get a clean error output with just the errors and warnings the script reported, without irrelevant information dumped by Cargo.

Version

cargo 1.86.0-nightly (fd784878c 2025-01-03)
@kornelski kornelski added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Jan 9, 2025
@epage epage added the A-build-scripts Area: build.rs scripts label Jan 9, 2025
@epage
Copy link
Contributor

epage commented Jan 9, 2025

I see there being 3 exit conditions with cargo::error

  • exit(0)
  • exit(1) (or any non-zero value)
  • panic!()

I would assume the exhaustive output would be most helpful in the case of a logic bug which would usually come in the form of a panic!(). Unfortunately, we can't distinguish those from exit(1). You mentioned trying to identify those in #10159 but I don't see a path forward that I would be comfortable with.

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case and it seems like it would only serve to annoy the users into "correct" behavior. While I'm generally good with reducing and adding friction to drive specific behavior, this feels too artificial and I'm not sure if it deserves such disincentivization.

For the concern for the exit(1) case, we never really did explore the hiding of directives behind --verbose from #10159 which could be an alternative.

@kornelski
Copy link
Contributor Author

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case

I do not think that developers would print cargo::error= and then exit the script with a success status code on purpose. I just don't see any motivation for doing that (other than this UI issue of course). It isn't much easier or better, or even lazier solution. It's possible to exit main with just a return, but panic!() isn't much harder to type. Where someone would call std::process::exit(0), it doesn't cost them anything to call std::process::exit(1) instead.

Exiting with an error status on failure is the correct thing to do, but this issue is not about beating non-compliant people with the best practices. It's the opposite: an observation that Rust users already value correctness, so they would find it objectionable to be forced to use incorrect and backwards-incompatible exit(0) to get better error reporting. Existing build scripts already use the right status codes, so nobody needs to be forced on that path. The compliance is already so good that non-compliance can be easily interpreted as a bug that developers would appreciate to have pointed out.

As much as I'd like to make pull requests to all the -sys crates to make them use the clean error presentation, I expect I'd have hard time convincing crate owners to accept a PR that uses exit(0) for printing errors.


Debug dump on cargo::error+exit(0) helps to detect issues like this:

fn probe_library(name: &str) -> bool {
   match search_for_it(name) {
       Ok(found) => {
          println!("cargo::rustc-link-lib={found}");
          return true;
       },
       Err(error) => {
          println!("cargo::error={error}");
          return false;
       }
    }
}

fn main() {
    if !probe_library("dep") {
        build_static().unwrap();
    }
}

This script prints cargo::error, even when it means to report a success (when build_static succeeds). Printing of cargo:: directives as a side effect on success is normal, and generally fine, because nobody probes a library to not use it. But printing of cargo::error is a bug in this situation. I imagine somebody could make such mistake, because printing something on both success and failure has a nice symmetry to it. Alternatively, such printing code could have been left in even after a helper function has been refactored from panicking/exiting to fallible. There are big and hairy build scripts that probe a bunch of libraries, so I can easily imagine logic bugs or copypaste bugs in helper functions, especially that failures of build scripts tend to be platform- and environment-specific, and "works on my machine" for the authors, so may be under-tested.


panic! can be detected by status code 101. Build scripts have no reason to set this code otherwise, so false positive rate on this will be effectively zero (and for users who really must use that exit status, exit(0) isn't suitable either).

Build scripts very often intentionally .unwrap() and panic! on expected run-time failures caused by the environment, so treating them as bugs would clash with the established practice (which is different from proper library Rust code).

However, the exit(n) vs panic! distinction is a red herring. The core issue is that a good presentation of cargo::error= doesn't work in well-behaved backwards-compatible build scripts. Whether they use panic! or Err to bubble the error, the issue remains the same.

If you think that I'm proposing to add the stdout/stderr dump on cargo::error + exit(0) to annoy users and add friction, then I hope you understand why I'm so annoyed Cargo already does this in the cargo::error + exit(1) case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-scripts Area: build.rs scripts C-bug Category: bug S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

2 participants