Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix downstream cargo vendor usage for component adapter crate #6486

Closed

Conversation

rvolosatovs
Copy link
Member

@rvolosatovs rvolosatovs commented May 31, 2023

(I have removed the old description, since it's no longer relevant)
Duplicate WIT dependencies of wasi crate in wasi-preview1-component-adapter to avoid using relative paths in wit-bindgen invocations fixing downstream dependencies on wasi-preview1-component-adapter using cargo vendor

This is checked in CI, but developers are free to use whichever tool they chose to achieve this. Running wit-deps lock from crates/wasi-preview1-component-adapter is one way to do it

@rvolosatovs rvolosatovs force-pushed the fix/cargo-vendor branch 3 times, most recently from c0fc3e0 to c1a4310 Compare May 31, 2023 19:12
rvolosatovs added a commit to rvolosatovs/wasmCloud that referenced this pull request May 31, 2023
Temporary, until bytecodealliance/wasmtime#6486
is merged

Signed-off-by: Roman Volosatovs <[email protected]>
@rvolosatovs rvolosatovs changed the title refactor: reorganize WIT and dependencies update and reorganize WIT May 31, 2023
@rvolosatovs rvolosatovs changed the title update and reorganize WIT update and reorganize WIT, fix cargo vendor May 31, 2023
@rvolosatovs rvolosatovs marked this pull request as ready for review May 31, 2023 19:32
@rvolosatovs rvolosatovs requested a review from a team as a code owner May 31, 2023 19:32
@rvolosatovs rvolosatovs requested review from pchickey and removed request for a team May 31, 2023 19:32
rvolosatovs added a commit to rvolosatovs/wasmCloud that referenced this pull request May 31, 2023
Temporary, until bytecodealliance/wasmtime#6486
is merged

Signed-off-by: Roman Volosatovs <[email protected]>
@pchickey
Copy link
Contributor

Thanks, I'll look at this in detail this afternoon.

To answer the first design question, yes lets please use the wit-deps binary in CI just like we did in p2-p. Please make a shell script inside ci/ (lets say ./ci/wit-deps-check.sh) which invokes the wit-deps CLI & runs git diff in the correct directories, and call that script from a CI step that is just like the one we added to build the component adapter.

It would also be nice if the shell script had (e.g. ./ci/wit-deps-check.sh update) would modify the working tree to what should be committed for the check to pass in CI, and print that out with the error in CI so folks who don't know this tool don't stumble over it.

rvolosatovs added a commit to wasmCloud/wasmCloud that referenced this pull request May 31, 2023
Temporary, until bytecodealliance/wasmtime#6486
is merged

Signed-off-by: Roman Volosatovs <[email protected]>
@alexcrichton
Copy link
Member

fix cargo vendor use cases

Theoretically the "verify-publish" step in CI should be checking this by running cargo package on all component crates and then building them from the result. That should verify that all crates build as they'll appear on crates.io, but given how things are probably broken currently to motivate the change in this PR my guess is that this isn't working as expected.

Do you have an example crate/coment with cargo vendor that's not working? If so I'd like to poke around at it and see if the "verify-publish" step can be improved to verify this continues to work into the future.

@pchickey
Copy link
Contributor

The diff here looks good. Roman and I had a discussion on zulip for how to add the CI check for this.

The component-adapter and test-programs crates would not have worked with cargo-vendor before this PR. I don't know which of those was a problem for vendoring into wasmcloud, but both of those are excluded from our verify-publish step.

@rvolosatovs
Copy link
Member Author

The diff here looks good. Roman and I had a discussion on zulip for how to add the CI check for this.

The component-adapter and test-programs crates would not have worked with cargo-vendor before this PR. I don't know which of those was a problem for vendoring into wasmcloud, but both of those are excluded from our verify-publish step.

For the record, we're vendoring the component adapter as a bindep https://github.com/wasmCloud/wasmCloud/tree/8544ef854e1695381def129c68c341a5fd9940e3/tests/wasi-adapter, if you're interested, that's how it's built: https://github.com/wasmCloud/wasmCloud/blob/8544ef854e1695381def129c68c341a5fd9940e3/tests/actors/build.rs#L116-L134 along with other Wasm as part of tests (so just cargo test from root is sufficient to run the whole test suite when using nightly, while the crate itself can be depended upon by stable Rust)

The trick is that our reproducible build process happens in a sandbox with no access to non-deterministic I/O, e.g. no networking without the matching content hash specified. It looks something like this:

  1. All dependencies are fetched from either a package registry (e.g. crates.io) or git repositories using hashes from Cargo.lock (which lets us fetch the actual contents, since we know the content hashes upfront)
  2. With all dependencies fetched a "vendor tree" is build, like cargo vendor would do (e.g. for published packages: https://github.com/ipetkov/crane/blob/35110cccf28823320f4fd697fcafcb5038683982/lib/downloadCargoPackage.nix#L19-L23 and https://github.com/ipetkov/crane/blob/35110cccf28823320f4fd697fcafcb5038683982/lib/vendorCargoRegistries.nix#L45-L103, there's similar logic for git dependencies, which we rely upon to vendor this repository)
  3. Later the actual cargo build step happens "offline", using the vendor directory constructed in (2.), just like it would work using plain cargo vendor

@rvolosatovs rvolosatovs requested a review from a team as a code owner June 1, 2023 10:29
@rvolosatovs rvolosatovs force-pushed the fix/cargo-vendor branch 3 times, most recently from d6d6b1c to b22c571 Compare June 1, 2023 11:16
@alexcrichton
Copy link
Member

I think it would simplify things for Wasmtime quite a bit if there could only be one set of WIT files managed here, rather than three. A large risk with multiple sets of WIT files that are all duplicates is that they can get out of sync, and there's no protection against that right now. In theory they should all be the exact same and ideally use something like symlinks but that doesn't all work out for windows/vendoring reasons.

Would it be possible to update your vendoring process of the component adapter to manually copy around the WIT files or similarly? Or alternatively could the vendored artifact be the Wasmtime repository since that's what the component adapter expects to be built within the context of?


Also, as an orthogonal point, right now deps.toml points to main.tar.gz archives but I think that main will need to be replaced with a commit or otherwise re-vendoring the WIT files will change over time as the branches get updated.

@rvolosatovs
Copy link
Member Author

rvolosatovs commented Jun 1, 2023

I think it would simplify things for Wasmtime quite a bit if there could only be one set of WIT files managed here, rather than three. A large risk with multiple sets of WIT files that are all duplicates is that they can get out of sync, and there's no protection against that right now. In theory they should all be the exact same and ideally use something like symlinks but that doesn't all work out for windows/vendoring reasons.

There's only one set of WIT files managed "from external sources" - in crates/wasi/wit, both component adapter and reactor-tests inherit from whatever is used within crates/wasi/wit, that's also why order matters in PATHS (crates/wasi/wit is a dependency of the other two crates)

Would it be possible to update your vendoring process of the component adapter to manually copy around the WIT files or similarly? Or alternatively could the vendored artifact be the Wasmtime repository since that's what the component adapter expects to be built within the context of?

That's exactly how it works in this PR, e.g. https://github.com/rvolosatovs/wasmtime/blob/b22c571b911c1bbdd2d9356d72de18606d3e08ce/crates/test-programs/reactor-tests/wit/deps.toml#L1 this just means copy all WIT definitions from this path along with all transitive dependencies

Also, as an orthogonal point, right now deps.toml points to main.tar.gz archives but I think that main will need to be replaced with a commit or otherwise re-vendoring the WIT files will change over time as the branches get updated.

That's not the case, wit-deps is lazy by default, so for any dependency X, for which Y = hash(X), if deps.lock contains Y as the hash of X, wit-deps lock (and corresponding wit_deps::lock_path) does not do anything.

  • Roughly, hash(X) = sha256sum(tar(filter_wit(X)) where X is a path to a dependency, e.g. crates/wasi/wit/deps/io and filter_wit filters all top-level WIT files in the directory. tar is deterministic.

Update:

Update 2:
Here's the "local path" case implementation https://github.com/bytecodealliance/wit-deps/blob/b4314067818a45413fc6c366983f19a80b2626fa/crates/wit-deps/src/manifest.rs#L193-L212

Update 3:
You could try running wasmtime-wit-deps with RUST_LOG=trace to see what happens on disk

@alexcrichton
Copy link
Member

Ah ok this PR has changed radically since I last looked at it, I apologize for not bringing myself back up to speed before leaving a comment. I'll also admit that I have yet to investigate, learn, and understand wit-deps in depth. I was previously naively assuming that it worked the same way as Cargo, or that it was intended to work similarly to Cargo. There's a few things I want to comment about though:

  • In a vacuum I don't think there's any reason to have multiple copies of WIT files in-tree. The previous build where relative paths are used works out for this repository's build. The crates in this repository using relative paths are not intended to be repackaged/re-vendored elsewhere. Your build process tries to use cargo vendor and what I'm asking is whether it's possible to update your build process to not do that. Can you use Wasmtime as a git checkout or something similar which allows building the crates as-is in-tree? Otherwise while the preview = ".." directive in deps.toml does solve the duplication issue I mentioned above that, to me at least, seems like a questionable design decision for wit-deps as a tool. I'll say again though that I don't fully understand the motivations of wit-deps beyond "it should be like Cargo" and I am also not following the development of it closely. That being said a blind "include that other file here" (as I'm interpreting what preview = "..." means) is not supported by Cargo and I'm not sure why would necessarily want to be supported long-term with wit-deps.
  • Along the lines of "I think wit-deps is like Cargo" the lock file format here does not work. The purpose of Cargo is that given a manifest and a lock file it can redownload and recover all artifacts to be able to build locally. Instead, however, this lock file is instead only recording the hash of what's vendored locally, which if that's the case I don't know why a lock file is used because there's not much need to hash what's already in the repository. Using main.tar.gz there's no reliable way for anyone else to reproduce the vendoring process which is what Cargo.lock is used for. I understand that wit-deps could probably be smart enough to not update things if not requested, but that to me seems like a subtle anti-pattern in the design of wit-deps where it's not as robust as it otherwise could be in terms of vendoring dependencies. This is why I dont' think main.tar.gz is used because in my mind the vendored state should be reproducible at any time by anyone.
  • Personally I don't understand why there's a new tool, much less and entire new workspace, being added to Wasmtime. If wit-deps is a CLI tool then it seems like we should run that CLI tool. Pat and I talked a bit this morning about this and one concern was that local developers may not have the same version as wit-deps used in CI, but we already have that "problem" with cargo vet and it generally works out fine. Basically CI and local developers track the latest version of wit-deps and if you get a newer version in CI when updating WIT files you'd update the CI version at the same time.

Orthogonally from my above comments, why are crates/wasi/wit/{cli,clocks,filesystem} added in this PR? Are they an aritfact of an intermediate state which weren't removed? I don't believe they're read by wit-parser and otherwise they aren't sharing bits with the copies under deps/{cli,clocks,filesystem}.

Additionally the world command, command-extended, proxy, and reactor I see are added as top-level "local" files in crates/wasi/wit/*.wit. Are these not present in any other upstream WASI repository? If not could they be added to the wasi:cli and wasi:http repositories respectively and referred-to from there?

@rvolosatovs
Copy link
Member Author

rvolosatovs commented Jun 2, 2023

  • In a vacuum I don't think there's any reason to have multiple copies of WIT files in-tree. The previous build where relative paths are used works out for this repository's build. The crates in this repository using relative paths are not intended to be repackaged/re-vendored elsewhere. Your build process tries to use cargo vendor and what I'm asking is whether it's possible to update your build process to not do that. Can you use Wasmtime as a git checkout or something similar which allows building the crates as-is in-tree? Otherwise while the preview = ".." directive in deps.toml does solve the duplication issue I mentioned above that, to me at least, seems like a questionable design decision for wit-deps as a tool. I'll say again though that I don't fully understand the motivations of wit-deps beyond "it should be like Cargo" and I am also not following the development of it closely. That being said a blind "include that other file here" (as I'm interpreting what preview = "..." means) is not supported by Cargo and I'm not sure why would necessarily want to be supported long-term with wit-deps.

How is this any different from cargo path dependencies as documented here: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-path-dependencies ? Sure I wish copying WIT files would not be necessary, but the reason it behaves this way is the fact that wit-bindgen requires all dependencies to be located in deps directory, which I would rather call a questionable design decision of wit-bindgen.
Usage of path dependencies (in wit-deps, but similarly in cargo) is generally discouraged and the use case is mostly just development, quickly replacing a dependency by a local, modified, copy.
Ideally, all WIT packages should specify their "real" dependencies with an external source. (e.g. the standard repositories) I'd be happy to do that, but 2 things are blocking that:

  1. Wasmtime does not use upstream WIT interfaces (filesystem, most importantly - clocks and cli are "transitively broken" due to different filesystem package used)
  2. There's no standard repository containing the command, reactor from what I can see
  • Along the lines of "I think wit-deps is like Cargo" the lock file format here does not work. The purpose of Cargo is that given a manifest and a lock file it can redownload and recover all artifacts to be able to build locally. Instead, however, this lock file is instead only recording the hash of what's vendored locally, which if that's the case I don't know why a lock file is used because there's not much need to hash what's already in the repository. Using main.tar.gz there's no reliable way for anyone else to reproduce the vendoring process which is what Cargo.lock is used for.

That's not true, for any given gzipped tarball URL X: hash(untar(fetch(X))) will give you the exact same hash as I described in #6486 (comment), see also that comment for hash definition. The reasoning here is simple, X could be a URL to a zip file or whatever else, still the hash in the lock would be the same and the integrity/equality of the contents could be verified. The lock file provides enough data precisely to be able to reproduce previous operation.

Soon it should not have to use tar, but rather it will be able to consume dependencies as Wasm bytecodealliance/wit-deps#25, at which point the Wasm is also what'd be hashed - but that again depends on wit-bindgen being able to actually consume dependencies as Wasm and not require raw WIT.

Note, that this tool was built originally as a temporary solution to address unmaintainable mess of copy-pasting WIT files around from external sources. Soon, downstream users of WIT interfaces will not need to actually use WIT directly, so it'll only be WIT interface developers, who will need some kind of wit management tool.

I understand that wit-deps could probably be smart enough to not update things if not requested, but that to me seems like a subtle anti-pattern in the design of wit-deps where it's not as robust as it otherwise could be in terms of vendoring dependencies. This is why I dont' think main.tar.gz is used because in my mind the vendored state should be reproducible at any time by anyone.

It's true that in the absence of an entry in local cache, main branch references and similar cannot be reproduced if they change. The only way around that would be implementing git support directly in wit-deps to record the rev that main pointed to at the time. I went with the quickest and simplest solution for MVP, just so that, again, downstream users of WIT packages, standard or not, could reasonably consume them without resorting to adding multiple submodules/subtrees/copy-pasting. The intended use case is indeed to pin to exact commits like so https://github.com/bytecodealliance/wit-deps/blob/b4314067818a45413fc6c366983f19a80b2626fa/tests/build/wit/deps.toml#L1-L2, in which case all of this is fully reproducible. I've only added the update functionality and examples with "dynamic" branch references after the first usage here WebAssembly/wasi-http#23, when I realized there's a need downstream to be able to "simply" pull in changes.

Note that wit-bindgen, again, cannot consume WIT standard repositories as-is, because of the wit directory within those repositories, which means that simply adding a submodule to a repository in wit/deps/pkg is not possible. Again, I'd say that's a wit-bindgen design drawback, which wit-deps, again, addresses to eliminate this headache from downstream developers consuming real WIT packages.

  • Personally I don't understand why there's a new tool, much less and entire new workspace, being added to Wasmtime. If wit-deps is a CLI tool then it seems like we should run that CLI tool. Pat and I talked a bit this morning about this and one concern was that local developers may not have the same version as wit-deps used in CI, but we already have that "problem" with cargo vet and it generally works out fine. Basically CI and local developers track the latest version of wit-deps and if you get a newer version in CI when updating WIT files you'd update the CI version at the same time.

The new workspace is only there due to cargo vet auditing, please see OP, I'm happy to drop b22c571 if someone could certify all the dependencies used. The reason for a new tool is, in fact, simplification of the update and consistency checking process, originally @pchickey's request #6486 (comment) - rather than having to manually copy-paste WIT around as is done today (and prone to mistakes, note how a bunch of WIT files were missed by the update in #6390) there is now just one tool with 2 subcommands, which take no arguments and can automate all of this tedious procedure and do "the right thing", as well as check consistency in CI.

It's nice to provide a working dev environment out-of-the-box. Requiring developers of Wasmtime to manually install development dependencies, to me, is an anti-pattern, that's why a Rust crate instead is very useful - it works cross-platform and developers do not need to install yet another binary executable.

Orthogonally from my above comments, why are crates/wasi/wit/{cli,clocks,filesystem} added in this PR? Are they an aritfact of an intermediate state which weren't removed? I don't believe they're read by wit-parser and otherwise they aren't sharing bits with the copies under deps/{cli,clocks,filesystem}.

Assuming that every WIT dependency must have an external source (e.g. a standard repo), deps/{cli,clocks,filesystem} are these sources, which should eventually be removed in favor of using upstream (see the TODO comment). Regarding different contents I cannot reproduce:

rvolosatovs@cobalt ..ecodealliance/wasmtime/crates/wasi/wit (git)-[fix/cargo-vendor] % for dep in cli clocks filesystem; do sha256sum $dep/*; done                       
99f45bfa45638e87bcb8e1c7b8fad65c2ac60fd70500fa447268371a0d87ed10  cli/environment.wit
a917b0e0381f0cb3f86e9c88c0fc84b46388595cf4858f02de2d597caee82e31  cli/exit.wit
2c60a8c95149393ac3698eb766da44c3e570dfca38646256c888ecb34f9717ab  cli/preopens.wit
57f8549af0058bf1bf58eccb9820715a907b2c17825e8eb7213ab647381935be  cli/stdio.wit
f0418770f24f7130d2eaf5f43316edc145ec4a524e77f02ace7db6171dcd8c76  clocks/monotonic-clock.wit
caf183ddda4bb6693f0cc9d3249e42a58a21091ff1764e92f652dfc8a1bbff6c  clocks/timezone.wit
44e062d552753f939b4dec8ce87b499ba46fdc0e88e8d630d7fd1bc0d29f7e13  clocks/wall-clock.wit
ce4e225baf21a915435f807a0bb47bc34f43227c0424156cab7c6c536a5d1755  filesystem/filesystem.wit
rvolosatovs@cobalt ..ecodealliance/wasmtime/crates/wasi/wit (git)-[fix/cargo-vendor] % for dep in cli clocks filesystem; do sha256sum deps/$dep/*; done
99f45bfa45638e87bcb8e1c7b8fad65c2ac60fd70500fa447268371a0d87ed10  deps/cli/environment.wit
a917b0e0381f0cb3f86e9c88c0fc84b46388595cf4858f02de2d597caee82e31  deps/cli/exit.wit
2c60a8c95149393ac3698eb766da44c3e570dfca38646256c888ecb34f9717ab  deps/cli/preopens.wit
57f8549af0058bf1bf58eccb9820715a907b2c17825e8eb7213ab647381935be  deps/cli/stdio.wit
f0418770f24f7130d2eaf5f43316edc145ec4a524e77f02ace7db6171dcd8c76  deps/clocks/monotonic-clock.wit
caf183ddda4bb6693f0cc9d3249e42a58a21091ff1764e92f652dfc8a1bbff6c  deps/clocks/timezone.wit
44e062d552753f939b4dec8ce87b499ba46fdc0e88e8d630d7fd1bc0d29f7e13  deps/clocks/wall-clock.wit
ce4e225baf21a915435f807a0bb47bc34f43227c0424156cab7c6c536a5d1755  deps/filesystem/filesystem.wit

Similarly, for the other two WIT packages:

rvolosatovs@cobalt ..es/wasi-preview1-component-adapter/wit (git)-[fix/cargo-vendor] % for dep in cli clocks filesystem; do sha256sum deps/$dep/*; done
99f45bfa45638e87bcb8e1c7b8fad65c2ac60fd70500fa447268371a0d87ed10  deps/cli/environment.wit
a917b0e0381f0cb3f86e9c88c0fc84b46388595cf4858f02de2d597caee82e31  deps/cli/exit.wit
2c60a8c95149393ac3698eb766da44c3e570dfca38646256c888ecb34f9717ab  deps/cli/preopens.wit
57f8549af0058bf1bf58eccb9820715a907b2c17825e8eb7213ab647381935be  deps/cli/stdio.wit
f0418770f24f7130d2eaf5f43316edc145ec4a524e77f02ace7db6171dcd8c76  deps/clocks/monotonic-clock.wit
caf183ddda4bb6693f0cc9d3249e42a58a21091ff1764e92f652dfc8a1bbff6c  deps/clocks/timezone.wit
44e062d552753f939b4dec8ce87b499ba46fdc0e88e8d630d7fd1bc0d29f7e13  deps/clocks/wall-clock.wit
ce4e225baf21a915435f807a0bb47bc34f43227c0424156cab7c6c536a5d1755  deps/filesystem/filesystem.wit
rvolosatovs@cobalt ..crates/test-programs/reactor-tests/wit (git)-[fix/cargo-vendor] % for dep in cli clocks filesystem; do sha256sum deps/$dep/*; done
99f45bfa45638e87bcb8e1c7b8fad65c2ac60fd70500fa447268371a0d87ed10  deps/cli/environment.wit
a917b0e0381f0cb3f86e9c88c0fc84b46388595cf4858f02de2d597caee82e31  deps/cli/exit.wit
2c60a8c95149393ac3698eb766da44c3e570dfca38646256c888ecb34f9717ab  deps/cli/preopens.wit
57f8549af0058bf1bf58eccb9820715a907b2c17825e8eb7213ab647381935be  deps/cli/stdio.wit
f0418770f24f7130d2eaf5f43316edc145ec4a524e77f02ace7db6171dcd8c76  deps/clocks/monotonic-clock.wit
caf183ddda4bb6693f0cc9d3249e42a58a21091ff1764e92f652dfc8a1bbff6c  deps/clocks/timezone.wit
44e062d552753f939b4dec8ce87b499ba46fdc0e88e8d630d7fd1bc0d29f7e13  deps/clocks/wall-clock.wit
ce4e225baf21a915435f807a0bb47bc34f43227c0424156cab7c6c536a5d1755  deps/filesystem/filesystem.wit

What exactly is different?

Additionally the world command, command-extended, proxy, and reactor I see are added as top-level "local" files in crates/wasi/wit/*.wit. Are these not present in any other upstream WASI repository? If not could they be added to the wasi:cli and wasi:http repositories respectively and referred-to from there?

Not that I know of, these seem to be WASI-specific, which is why I moved them there. Maybe there should be a wasi-preview repository containing them, whatever it is - it's out of scope for this PR.
wit/deps/preview should imply that that's an external dependency from the perspective of the WIT package and that's not the case, in fact the preview package is defined in-tree, so I moved to it where it belongs.

@elliottt elliottt removed the request for review from a team June 2, 2023 16:42
@rvolosatovs
Copy link
Member Author

I've removed all dependency updates and any reorganization from this PR, it's just fixing the cargo vendor use case for downstream crates. I really hope we can align on getting this fix over the line

@rvolosatovs rvolosatovs changed the title update and reorganize WIT, fix cargo vendor fix downstream cargo vendor usage for component adapter crate Jun 15, 2023
@pchickey
Copy link
Contributor

Sorry this stalled, I was on vacation and am digging myself out.

It looks like this diff is down to just using the wit-deps tool to make sure that the two copies of wit remain identical. Can we instead just use diff(1), invoked with the right paths via a new script in ./ci/, to perform that role?

@rvolosatovs
Copy link
Member Author

not relevant for our use case anymore, closing

@rvolosatovs rvolosatovs deleted the fix/cargo-vendor branch July 19, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants