You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CopyFile then proceeds operating on the destination file handle. Various syscalls to write() are issued and succeed. (Note source and destination are on different filesystems, so no CopyFileRange. Unclear if relevant.)
No fsync is issued.
Disposal of the destination file handle then happens. I think this calls through to
Dealing with error returns from close()
A careful programmer will check the return value of close(),
since it is quite possible that errors on a previous write(2)
operation are reported only on the final close() that releases
the open file description. Failing to check the return value
when closing a file may lead to silent loss of data.
And follows up with
A careful programmer who wants to know about I/O errors may
precede close() with a call to fsync(2).
We've definitely seen in the wild with NFS that:
write syscalls have succeeded
close fails (for whatever reason - the network is fallible)
data is lost
success is reported to the user
strace and deliberately injecting faults leads me to the above.
This is only happening to a diminishingly small amount of operations, it isn't the case that my network or disks are entirely screwed. It's just that when it does happen - a few MB every PB of writes in my repro - data is bad and the application can't tell.
Reproduction Steps
System.IO.File.Copy("input", "output");
And something like the following, compiled and bashed in with LD_PRELOAD. All credit Copilot, all bugs mine.
Or a variant on strace -e trace=close -e fault=close:error=EIO -p <pid>
to achieve the same
Expected behavior
Either success and destination file has complete data, or failure reported to caller.
In the example above, that necessitates either not silently ignoring the close() call, or issuing an fsync beforehand.
Actual behavior
Silent data loss.
Regression?
No response
Known Workarounds
Can work around with lower level APIs, but "everyone" will use this one so it should probably be made safe.
Configuration
dotnet 8.0.110
Ubuntu 22.04.5 LTS
x64
Not believed specific to any of these
Other information
No response
The text was updated successfully, but these errors were encountered:
Description
System.IO.File.Copy(source_string, dest_string) has been observed to return success but leave corrupt data in the destination file.
dotnet 8.0.110, Ubuntu 22.04.5 LTS, x64.
Hypothesis
runtime/src/libraries/System.Private.CoreLib/src/System/IO/File.cs
Line 56 in 03b2d3d
sets up destination file handle here
runtime/src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Unix.cs
Line 47 in 03b2d3d
CopyFile then proceeds operating on the destination file handle. Various syscalls to write() are issued and succeed. (Note source and destination are on different filesystems, so no CopyFileRange. Unclear if relevant.)
No fsync is issued.
Disposal of the destination file handle then happens. I think this calls through to
runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/InteropServices/SafeHandle.cs
Line 267 in 03b2d3d
https://man7.org/linux/man-pages/man2/close.2.html says
And follows up with
We've definitely seen in the wild with NFS that:
strace and deliberately injecting faults leads me to the above.
This is only happening to a diminishingly small amount of operations, it isn't the case that my network or disks are entirely screwed. It's just that when it does happen - a few MB every PB of writes in my repro - data is bad and the application can't tell.
Reproduction Steps
System.IO.File.Copy("input",
"output");And something like the following, compiled and bashed in with LD_PRELOAD. All credit Copilot, all bugs mine.
`#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
typedef int (*orig_close_t)(int);
int close(int fd) {
static orig_close_t orig_close = NULL;
if (!orig_close) {
orig_close = (orig_close_t)dlsym(RTLD_NEXT, "close");
}
char path[1024];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
}`
Or a variant on
strace -e trace=close -e fault=close:error=EIO -p <pid>
to achieve the same
Expected behavior
Either success and destination file has complete data, or failure reported to caller.
In the example above, that necessitates either not silently ignoring the close() call, or issuing an fsync beforehand.
Actual behavior
Silent data loss.
Regression?
No response
Known Workarounds
Can work around with lower level APIs, but "everyone" will use this one so it should probably be made safe.
Configuration
dotnet 8.0.110
Ubuntu 22.04.5 LTS
x64
Not believed specific to any of these
Other information
No response
The text was updated successfully, but these errors were encountered: