-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent wasm functions from being able to access their source. #106
Comments
Hi 👋, I agree with you that OCI image of fs layers is not the best nor the long term answer for distributing Wasm applications. As you alluded to, it was what we did to bootstrap. In the longer term, I'd like to see us move to an OCI Artifact representation for Wasm applications (components). I've been working on a proposal for OCI support in BCA SIG-Registry. Feedback is appreciated.
That may not necessarily be the case for all things. Consider a Spin app that contains triggers for multiple .wasm Microservices. If you were to distribute that application, you would likely want to deliver the app as one package. Also of note, @jsturtevant is diving into stream processors in containerd to see if we can light up custom mediatypes (artifacts). |
thanks for adding me to the discussion!
I am curious to know why the OCI image format is not the long-term answer to distributing the Wasm application. Even if it is more flexible, since it allows packing multiple files as part of the same layer, is there anything missing that cannot be done at all today? I am asking since the way we are treating wasm payload in crun is the same as any other container, so I might be missing important use cases here :-) |
As the Wasm ecosystem moves more toward components, the thought is that components will form a graph of dependencies which would be nice not to duplicate in the registry. This comment talks a bit about it: bytecodealliance/registry#87 (comment). |
I wrote a few thoughts on this here deislabs/containerd-wasm-shims#89 (comment)
The biggest concern I have is being platform agnostic (deislabs/containerd-wasm-shims#89). Using the image format today potentially lead to multi-arch images down the line. I think the ideal to drive towards would be to have a single WASM artifact that could be run on any system. |
do we have this risk if we specify something like |
This is I suppose a real use of ENTRYPOINT indeed, but isn't it also possible for spin to dispatch via different containers? I feel it is rather special case if the side effect is that everyone and most use cases really would end up having to expose the wasm to allow this when there are alternative ways to do routing. |
Not so much multi-arch, but multi-OS? (wasi is grey area, but more on the OS side). I'm not sure how often multi-platform images are built for the OS side of it, but I think it is really the killer feature of using OCI at all (vs http fetch for what amounts to a single file for most people). Knowing there is a transition process, people should be able to build twice, rather than bloat the rootFS with two different variants (wasip1 and wasip1+N). I think this plays to the strength of OCI in other words to use multi-platform to help folks both run today, and possibly on experimental new versions of wasi. I harvested opencontainers/image-spec#1053 out of the issue you linked. |
I saw bytecodealliance/registry#87 pitched on this thread. I would like to clarify it is more a discussion about component model and not really anything specifically about this. That said, a future CM spec probably should also not leak its source to calling functions, so happy CM folks are watching! |
|
@codefromthecrypt I did not understand your comment completely 😅 regarding the pitfall of |
FWIW, for some work I'm thinking about in dapr, I'm likely to do that meanwhile, too (yank the wasm into a buffer and delete the underlying file) |
imagine if you had to use "bsd" for anything "bsd like". using "wasi" for all things related to that term is going to break similarly. More importantly, the ecosystem will eventually need a bridge from current wasi (aka wasip1) to work in progress versions (first wasip2, then p3 and however many until 1.0) So wasi usually means the preview one, specifically functions similar to sys calls described here however, there's a lot of work and a new jargon "component model" talking about the successor to this, which is wholly incompatible yet confusingly branded preview 2. Specifically to complete the feature surface of preview1 means the work-in-progress CLI World (world is also new jargon). So, preview2 will be something like that, and will be unstable at least for another preview or so. So, we've been conflating "wasi" with a set of system calls (ABI), specifically the preview1 definition out for a couple years. However, WASI is an organization of people, not a spec, and the next version won't work with the former. So, let's say we used the word "wasi" regardless, as an OS. What would happen is that probably all runtimes regardless of underlying OS (well at least linux vs darwin) would work if the code in that layer was wasip1. However, if it wasn't it probably won't work for most. At runtime you would get failures like incorrect function signature. To prevent this known pitfall, starting will Go and now other compilers like rust and JS, they are distancing from "wasi" being like an ABI, because it isn't, and being a bit more specific. Hence, "wasip1" and "wasip2" allow users to bundle a multi-platform image with both syscall approaches. They likely will need want to do that because even if the latter were stable (we know there will be a p3, so it won't), they would have dramatically different performance profiles and also not all runtimes will support "wasip2" for a while. |
just an idea, but to differentiate among different "wasi" versions, could we use the variant field in the OCI image? Similar to what is done for the ARM architecture: https://github.com/opencontainers/image-spec/blob/main/image-index.md#platform-variants |
It could, but it would need to be os.version I think because of this being OS level not arch (wasm is arch). There was a discussion about this approach on the Go issue, having to create a new variable GOWASI since wasi isn't an arch. This was eventually dismissed for reasons including wasp2+ is entirely different impl (rewrite from scratch) vs wasip1 (derived from cloudabi) golang/go#58141 (comment) Also, with a few compilers moving this to a single field, it seems better to choose something that might match at least one compiler. |
@codefromthecrypt, this thread is starting to wander. Perhaps, the runwasi community call or CNCF slack #runwasi would be a good place to discuss some of these broader issues. Per the opening of the thread:
I guess I'm not entirely clear on what you mean about Wasm can inspect itself.
|
Yup, two issues I can see here.
More generally, per the first issue, the world had to get something running, so most of the "cruft" you're seeing in terms of containers in the classic sense are the impacts of getting up and running using an informal standard. In fact, runwasi doesn't actually care at all what the platform is -- at the moment. It WILL of course, as people start to understand what's the best way to be specific in this space. I would say the same about item 2. More generally, we ARE doing something new here and using a system that wasn't designed for this. As a result, there IS an objective to do things in an understandable way from the pov of OCI but there is no objective to "be canonical OCI" at the moment. To get there, you bet. In any case, to chew on all this, let's take @devigned's suggestion and move to the runwasi channel for the detailed back and forth! When we figure out concrete issues, variations on themes and so on, we can create issues here and move on them. BTW, @giuseppe do return with any experimentation you do here; I had some great talks with Tony Kay at wasmio and love what you're doing. |
One way I think we could handle this down the road is to have wasm modules be a blob by themselves with their own media type and then rootfs layers would be another blob with the normal types. With this approach containerd would have to know about this split and populate the OCI bundle accordingly. |
This is far better also for the runtimes. Because the runtime shouldn't really provision a filesystem at all unless it is really required. For example, some non OCI stuff looks at imports and if there are no fs-based imports, there's really no need to make an FS. Even if there are imports, and even if you are fine reading or overwriting the wasm, maybe you don't want to give a bad actor a way to fill disk. Finally, with a separate layer you don't need ENTRYPOINT to be a specific filename by convention, possibly broken as people try to hack stuff into the filename #102 |
I've put together a proposal that addresses the concern by using OCI artifacts in the way @cpuguy83 described above: https://docs.google.com/document/d/11shgC3l6gplBjWF1VJCWvN_9do51otscAm0hBDGSSAc |
As #147 has been merged, can we close this issue? |
Closing this issue as the CNCF Wasm-WG community has stardandized the OCI Artifact format for Wasm that addresses this problem by isolating the Wasm binary. |
TL;DR; wasm can currently inspect itself, and it probably shouldn't. A way to prevent this either by format/tooling or config is maybe a good idea.
Opening here despite the possibility of a better repo. I also considered on crun. Feel free to punt me to a better place.
It seems the ENTRYPOINT is a marker to identify the %.wasm file which is possibly amongst other files in rootFS layers. Code like below shows the guest must be inside the rootFS. In other words the rootFS is mounted, and the same source includes the wasm.
runwasi/crates/containerd-shim-wasmtime/src/instance.rs
Line 137 in 74c08ce
I think this is convenient as it allows re-use of tools, but it would be surprising from a black-box or how normal wasm runtimes work. Normally the wasm source is specified independent of any filesystem mounts, and it would be surprising or a mistake for someone to mount their wasm in a place functions can accidentally or otherwise inspect it.
In other words, if I had to guess, someone thought about using an existing wasm layer type or a custom one (remember wasm is a single file so has no benefit of layers), but that would require changes to Dockerfile or its successors and said, nah. Maybe? I really don't know why choices were made, but it seems reasonable if the goal was to get building with the existing ecosystem.
This said, I think there are a lot of things that will take time to correct. I think a way to not leak the source wasm is worth asking for, either as a runtime-specific feature (here) or in some spec (no idea where).
Copying some people who may have thoughts and would act differently perhaps based on outcome,
I intentionally spammed only 3, so yeah feedback welcome regardless of from whom. I think we should have a clear rationale, even if reverse engineered, on this one.
The text was updated successfully, but these errors were encountered: