Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in File.Copy (and possibly elsewhere) on Linux #111807

Open
rgr21 opened this issue Jan 24, 2025 · 1 comment
Open

Problem in File.Copy (and possibly elsewhere) on Linux #111807

rgr21 opened this issue Jan 24, 2025 · 1 comment
Labels
area-System.IO untriaged New issue has not been triaged by the area owner

Comments

@rgr21
Copy link

rgr21 commented Jan 24, 2025

Description

System.IO.File.Copy(source_string, dest_string) has been observed to return success but leave corrupt data in the destination file.

dotnet 8.0.110, Ubuntu 22.04.5 LTS, x64.

Hypothesis

FileSystem.CopyFile(Path.GetFullPath(sourceFileName), Path.GetFullPath(destFileName), overwrite);

sets up destination file handle here
using SafeFileHandle dst = SafeFileHandle.Open(destFullPath, overwrite ? FileMode.Create : FileMode.CreateNew,

CopyFile then proceeds operating on the destination file handle. Various syscalls to write() are issued and succeed. (Note source and destination are on different filesystems, so no CopyFileRange. Unclear if relevant.)

No fsync is issued.

Disposal of the destination file handle then happens. I think this calls through to

where the code from ReleaseHandle (which is where our syscall close happens) is ignored.

https://man7.org/linux/man-pages/man2/close.2.html says

Dealing with error returns from close()
A careful programmer will check the return value of close(),
since it is quite possible that errors on a previous write(2)
operation are reported only on the final close() that releases
the open file description. Failing to check the return value
when closing a file may lead to silent loss of data.

And follows up with

A careful programmer who wants to know about I/O errors may
precede close() with a call to fsync(2).

We've definitely seen in the wild with NFS that:

  1. write syscalls have succeeded
  2. close fails (for whatever reason - the network is fallible)
  3. data is lost
  4. success is reported to the user

strace and deliberately injecting faults leads me to the above.

This is only happening to a diminishingly small amount of operations, it isn't the case that my network or disks are entirely screwed. It's just that when it does happen - a few MB every PB of writes in my repro - data is bad and the application can't tell.

Reproduction Steps

System.IO.File.Copy("input", "output");

And something like the following, compiled and bashed in with LD_PRELOAD. All credit Copilot, all bugs mine.

`#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>

typedef int (*orig_close_t)(int);

int close(int fd) {
static orig_close_t orig_close = NULL;
if (!orig_close) {
orig_close = (orig_close_t)dlsym(RTLD_NEXT, "close");
}
char path[1024];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);

char resolved_path[1024];
if (readlink(path, resolved_path, sizeof(resolved_path)) > 0) {
    if (strstr(resolved_path, "output")) {
        errno = EIO;
        return -1;
    }
}

return orig_close(fd);

}`

Or a variant on
strace -e trace=close -e fault=close:error=EIO -p <pid>
to achieve the same

Expected behavior

Either success and destination file has complete data, or failure reported to caller.
In the example above, that necessitates either not silently ignoring the close() call, or issuing an fsync beforehand.

Actual behavior

Silent data loss.

Regression?

No response

Known Workarounds

Can work around with lower level APIs, but "everyone" will use this one so it should probably be made safe.

Configuration

dotnet 8.0.110
Ubuntu 22.04.5 LTS
x64

Not believed specific to any of these

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 24, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

1 participant