Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

kornelski · 2025-01-09T02:06:33Z

Problem

The current implementation of cargo::error has a minimal, concise error presentation only when the build script exits with a successful status code (0) when it reports a build failure.

If a build script fails with a non-zero status code, use of the cargo::error directive doesn't prevent Cargo from adding a full dump of stdout/stderr after it, which makes the output noisy and difficult to understand #10159.

The current implementation incentivises build scripts to exit with the status 0 (success) to cleanly report fatal build errors. This is a non-standard behavior for reporting failures, and it's not backwards compatible with older Cargo versions that don't understand cargo::error.

Dumping the full stdout/stderr output is a major problem that diminishes usefulness of cargo::error. The dump is so noisy that the errors reported via cargo::error may be less visible and less readable than a raw text printed to stderr. The stdout dump can contain long lists of cargo::rerun-if-(env-)changed directives, and other potentially irrelevant logs from subcommands. A long noisy stdout/stderr dump usually causes users to stop reading the output carefully, and gloss over the entire output instead. It may even push the "error:" lines out of the view, leaving only the raw dump on screen, which ends with less helpful content. The pkg-config and cmake crates have this problem, and they also happen to be the most frequent source of build script failures. pkg-config prints 21 cargo::rerun-if-env-changed lines, which together with Cargo's boilerplate is 25 lines, the same as the default height of cmd.exe console on Windows. Windows users may literally not see the errors generated by cargo::error if a build script fails with a failure status code. The long output dump can be made even noisier when the build script panics, and Cargo actively encourages users to enable even longer backtraces, which pushes the "error:" lines even further, and obscures even the last lines of stderr that may contain another copy of the error.

I think the conditions for when cargo::error= has a clean presentation should be swapped: it should give the cleanest presentation when the build script reports cargo::error and exits with a failure status code to match it. This is a consistent response.

When a build script prints cargo::error, but exits with a success code, this is a self-contradictory result. This is likely a bug in the build script. Such failures could be caused by helper functions or build-time dependencies eagerly printing cargo::error, even when the build script manages to recover from the error and succeeds using some other way. To help catch such rogue cargo::errors that break succesful build scripts, Cargo should be printing the full stdout dump, so that build script authors can check where in the log the unintended directive has appeared.

In other words, this is the current behavior:

	no structured errors	`cargo::error=`
exit code: success	success (clean output)	error (terse output)
exit code: failure	dump all output	error + dump all output

but to make cargo::error useful to end users, and bad error reporting debuggable, it should work like this:

	no structured errors	`cargo::error=`
exit code: success	success (clean output)	error + dump all output
exit code: failure	dump all output	error (clean output)

Steps

Buggy build.rs:

use std::process::ExitCode;

fn main() -> ExitCode {
    println!("cargo::error=oops");
    ExitCode::SUCCESS
}

happens to get a clean error output with a high signal-to-noise ratio:

error: [email protected]: oops
error: build script logged errors

A more typical script that exits with a correct status code:

fn main() {
    println!("cargo::error=oops");
    panic!("oops")
}

gets an ugly noisy dump that makes it hard for end users to understand what's going on:

error: [email protected]: oops
error: failed to run custom build command for `buildexample v0.1.0 (/path/to/buildexample)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `/path/to/buildexample/target/debug/build/buildexample-0e711ca19eba66e9/build-script-build` (exit status: 101)
  --- stdout
  cargo::error=oops

  --- stderr
  thread 'main' panicked at build.rs:3:5:
  oops
  stack backtrace:
     0: rust_begin_unwind
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
     1: core::panicking::panic_fmt
               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
     2: build_script_build::main
     3: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Possible Solution(s)

build scripts that report both failure and success at the same time (one via cargo:: the other via status) should get the stdout dump to help find the source of the contradictory response.

build scripts that print cargo::error= and correctly return failure code should get a clean error output with just the errors and warnings the script reported, without irrelevant information dumped by Cargo.

Version

cargo 1.86.0-nightly (fd784878c 2025-01-03)

The text was updated successfully, but these errors were encountered:

epage · 2025-01-09T02:24:08Z

I see there being 3 exit conditions with cargo::error

exit(0)
exit(1) (or any non-zero value)
panic!()

I would assume the exhaustive output would be most helpful in the case of a logic bug which would usually come in the form of a panic!(). Unfortunately, we can't distinguish those from exit(1). You mentioned trying to identify those in #10159 but I don't see a path forward that I would be comfortable with.

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case and it seems like it would only serve to annoy the users into "correct" behavior. While I'm generally good with reducing and adding friction to drive specific behavior, this feels too artificial and I'm not sure if it deserves such disincentivization.

For the concern for the exit(1) case, we never really did explore the hiding of directives behind --verbose from #10159 which could be an alternative.

kornelski · 2025-01-09T04:27:21Z

I don't see how the extra output would be relevant to help with an "error" with the exit(0) case

I do not think that developers would print cargo::error= and then exit the script with a success status code on purpose. I just don't see any motivation for doing that (other than this UI issue of course). It isn't much easier or better, or even lazier solution. It's possible to exit main with just a return, but panic!() isn't much harder to type. Where someone would call std::process::exit(0), it doesn't cost them anything to call std::process::exit(1) instead.

Exiting with an error status on failure is the correct thing to do, but this issue is not about beating non-compliant people with the best practices. It's the opposite: an observation that Rust users already value correctness, so they would find it objectionable to be forced to use incorrect and backwards-incompatible exit(0) to get better error reporting. Existing build scripts already use the right status codes, so nobody needs to be forced on that path. The compliance is already so good that non-compliance can be easily interpreted as a bug that developers would appreciate to have pointed out.

As much as I'd like to make pull requests to all the -sys crates to make them use the clean error presentation, I expect I'd have hard time convincing crate owners to accept a PR that uses exit(0) for printing errors.

Debug dump on cargo::error+exit(0) helps to detect issues like this:

fn probe_library(name: &str) -> bool {
   match search_for_it(name) {
       Ok(found) => {
          println!("cargo::rustc-link-lib={found}");
          return true;
       },
       Err(error) => {
          println!("cargo::error={error}");
          return false;
       }
    }
}

fn main() {
    if !probe_library("dep") {
        build_static().unwrap();
    }
}

This script prints cargo::error, even when it means to report a success (when build_static succeeds). Printing of cargo:: directives as a side effect on success is normal, and generally fine, because nobody probes a library to not use it. But printing of cargo::error is a bug in this situation. I imagine somebody could make such mistake, because printing something on both success and failure has a nice symmetry to it. Alternatively, such printing code could have been left in even after a helper function has been refactored from panicking/exiting to fallible. There are big and hairy build scripts that probe a bunch of libraries, so I can easily imagine logic bugs or copypaste bugs in helper functions, especially that failures of build scripts tend to be platform- and environment-specific, and "works on my machine" for the authors, so may be under-tested.

panic! can be detected by status code 101. Build scripts have no reason to set this code otherwise, so false positive rate on this will be effectively zero (and for users who really must use that exit status, exit(0) isn't suitable either).

Build scripts very often intentionally .unwrap() and panic! on expected run-time failures caused by the environment, so treating them as bugs would clash with the established practice (which is different from proper library Rust code).

pkg-config has all examples with unwrap(): https://docs.rs/pkg-config/latest/pkg_config/#example
system-deps tells users to add code with unwrap(): https://docs.rs/system-deps/latest/system_deps/
cc crate panics on errors reported by C compilers: https://docs.rs/cc/latest/cc/struct.Build.html#method.compile
zstd-sys uses the panicking compilation method, and unwraps all over the place: https://github.com/gyscos/zstd-rs/blob/bfe1e34f593c2427e9dc35f1b98d06788174711f/zstd-safe/zstd-sys/build.rs#L225
curl-sys too: https://github.com/alexcrichton/curl-rust/blob/e6f7be6077a28727909135bb40ca82c6465a9a69/curl-sys/build.rs#L447
zlib-sys too: https://github.com/rust-lang/libz-sys/blob/8462d47d51e36c8cd7fa83db3cbcc2b725385650/build.rs#L147
lzma-sys too: https://github.com/alexcrichton/xz2-rs/blob/1a82c40d6d80171b7df328aea43b7054acd10c44/lzma-sys/build.rs#L77
openssl-sys intentionally panics as a way to print messages to the user: https://github.com/sfackler/rust-openssl/blob/538a5cb737e8d83085553cac01643820dc7ff205/openssl-sys/build/main.rs#L422-L432
bzip2-sys too: https://github.com/rust-lang/git2-rs/blob/81522979178da3751ba9ebe460f9e45cda706a6e/libgit2-sys/build.rs#L45-L51
libsqlite3-sys too: https://github.com/rusqlite/rusqlite/blob/903677bda605029d53a46bf546ab53240191a30f/libsqlite3-sys/build.rs#L344

However, the exit(n) vs panic! distinction is a red herring. The core issue is that a good presentation of cargo::error= doesn't work in well-behaved backwards-compatible build scripts. Whether they use panic! or Err to bubble the error, the issue remains the same.

If you think that I'm proposing to add the stdout/stderr dump on cargo::error + exit(0) to annoy users and add friction, then I hope you understand why I'm so annoyed Cargo already does this in the cargo::error + exit(1) case.

kornelski added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Jan 9, 2025

epage added the A-build-scripts Area: build.rs scripts label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

kornelski commented Jan 9, 2025 •

edited

Loading

epage commented Jan 9, 2025

kornelski commented Jan 9, 2025

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

Presentation of cargo::error encourages build scripts to exit with a successful status code on failure #15038

Comments

kornelski commented Jan 9, 2025 • edited Loading

Problem

Steps

Possible Solution(s)

Version

epage commented Jan 9, 2025

kornelski commented Jan 9, 2025

kornelski commented Jan 9, 2025 •

edited

Loading