Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make write(IO, Char) actually return the amount of printed bytes instead of the attempted written bytes. #56980

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Jan 7, 2025

This might break some tests but I want to see which

@Seelengrab
Copy link
Contributor

Seelengrab commented Jan 7, 2025

Isn't this aligning the implementation with the documented behavior? So this should actually be a bugfix, no?

write(io::IO, x)

[...] Return the number of bytes written into the stream.

@gbaraldi
Copy link
Member Author

gbaraldi commented Jan 7, 2025

Yes, @topolarity and I found this while griping about how it's hard to know if write truncated bytes when taking in non string like things, and I got confused as to why me writing to a full buffer was always succeeding

@gbaraldi gbaraldi added io Involving the I/O subsystem: libuv, read, write, etc. bugfix This change fixes an existing bug backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 labels Jan 7, 2025
@Seelengrab
Copy link
Contributor

This should definitely get a regression test before its merged though - perhaps something like

io = IOBuffer(;maxsize=1)
write(io, 'a')
@test write(io, 'a') == 0

?

while true
write(io, u % UInt8)
n += write(io, u % UInt8)
(u >>= 8) == 0 && return n
Copy link
Contributor

@Seelengrab Seelengrab Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently unconditionally advances the given character, but what happens in case the first write fails, and the second succeeds? Now there's suddenly a torn write involved here, and even though you can theoretically know that not all of the given Char has been written (e.g. getting a return value of 3 when a 4-byte Char is passed), you still wouldn't know which byte was dropped.

I think it would be good to return after the first failing write, so that it's at least knowable that a valid prefix has been written (if the return value is nonzero).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any Julia IO type where writing a byte can fail, return zero, and then succeed, without some error being thrown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, writing a byte with TranscodingStreams.jl will either return 1 or throw an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, a non-blocking buffered IO whose buffer is temporarily full, for example. I don't know whether there currently is such a type in the ecosystem, but the point is that it could exist and would be a valid IO, as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a (slightly contrived) example:

julia> io = IOBuffer(; maxsize=1)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=1, ptr=1, mark=-1)

julia> write(io, 'a')
1

julia> write(io, 'a') # should be 0 with this PR, since the write doesn't succeed
1

julia> seekstart(io); # simulate a read-end on some other process, for example

julia> read(io, Char) # happens on the read-end
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> write(io, 'a') # continue writing
1

It's a bit awkward to do this with an IOBuffer, but the principle is the same for some IO type that has an actual read-end that's distinct from the write end. For arbitrary I/O, it's usually preferrable to drop data on the write end and retry later once the buffer is ready to send again. With the current behavior, the writer wouldn't know what to try to retransmit over the I/O, since it's impossible to know which byte(s) of the Char was/were not transmitted correctly. Effectively, the number returned by write becomes irrelevant, and only matters when it matches sizeof(Char) - at which point we might as well only return true/false. If we instead abort as soon as any internal write fails, we know that at least a correct prefix of the Char (or any data, in the general case) was returned, and we can retry with only the data that we haven't attempted to transmit at all yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think write ever errors for us, given asyncio and other stuff?

Copy link
Member

@JeffBezanson JeffBezanson Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, e.g. trying to write to a read-only stream. write itself has a synchronous API, i.e. it is (task-)blocking.

Copy link
Contributor

@Seelengrab Seelengrab Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero is not a "valid" return value in that case.

There is interesting historical data suggesting that some implementations of libc write were indeed able to return 0: https://stackoverflow.com/a/41970485

For quite a bunch of kinds of files, the behavior is unspecified, so more or less anything goes either way 🤷

I don't think write ever errors for us, given asyncio and other stuff?

Right, and for a non-blocking buffered IO it would be incredibly awkward to throw actual errors just because it's full. That possibility would be incredibly detrimental in the common case of success. I admit having 0 signal that is quite a bad API though. I guess this is yet-another case something like a Result{Int, Err} sum type would be nice, to distinguish success from errors 🤔

Maybe let me put it another way - would this be a valid IO subtype (barring some other missing methods)?

struct FlakyIO <: IO
    io::IO
end

Base.write(fio::FlakyIO, b::UInt8) = rand(Bool) ? write(fio.io, b) : 0

You could get very fancy and record which writes succeeded & which ones failed for introspection later on, or do some more complicated scheme for deciding when exactly it "fails" to write anything. This kind of type would be incredibly useful for fuzzing stuff that accidentally depends on writes to IO always succeeding (like the fallback method of write in Base does, for example).

One issue I see with just throwing an error for partial writes/write failures of parts of larger types is that then the return value of write becomes meaningless - either we always get a full write, or we get an error. There would be no more room for partial writes, which can happen in a bunch of cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversation is worth continuing, but for the purposes of fixing this bug I think it's orthogonal.

Our AbstractArray write method can also suffer "torn" writes in the same way:

function unsafe_write(s::IO, p::Ptr{UInt8}, n::UInt)
    written::Int = 0
    for i = 1:n
        written += write(s, unsafe_load(p, i))
    end
    return written
end

This is probably worth splitting into a separate issue and fixing across-the-board. The only thing I think this needs to merge @gbaraldi is a test.

Copy link
Member

@topolarity topolarity Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #57011 to continue discussion here

@JeffBezanson JeffBezanson added the needs tests Unit tests are required for this change label Jan 7, 2025
@topolarity topolarity added merge me PR is reviewed. Merge when all tests are passing and removed needs tests Unit tests are required for this change labels Jan 9, 2025
@IanButterworth IanButterworth merged commit 6ac351a into master Jan 10, 2025
8 of 9 checks passed
@IanButterworth IanButterworth deleted the gb/writebytes branch January 10, 2025 02:24
@topolarity topolarity removed the merge me PR is reviewed. Merge when all tests are passing label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.10 Change should be backported to the 1.10 release backport 1.11 Change should be backported to release-1.11 bugfix This change fixes an existing bug io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants