On 01/20/2018 09:07 PM, Jens Axboe wrote: > On 1/20/18 7:23 PM, Goldwyn Rodrigues wrote: >> >> >> On 01/20/2018 08:11 PM, Andi Kleen wrote: >>> Goldwyn Rodrigues <rgoldwyn@xxxxxxx> writes: >>> >>>> From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> >>>> >>>> In case direct I/O encounters an error midway, it returns the error. >>>> Instead it should be returning the number of bytes transferred so far. >>> >>> It's likely there's a lot of code in user space that does >>> >>> if (write(..., N) < 0) handle error >>> >>> With your change it would need to be >>> >>> if (write(..., N) != N) handle error >>> >>> How much code is actually doing that? >>> >>> I can understand it fixes your artifical test suite, but it seems to me your >>> change has a high potential to break a lot of existing user code >>> in subtle ways. So it seems to be a bad idea. >>> >>> -Andi >>> >> >> >> Quoting 'man 2 write': >> >> RETURN VALUE >> On success, the number of bytes written is returned (zero indicates >> nothing was written). It is not an error if this number is smaller >> than the number of bytes requested; this may happen for example because >> the disk device was filled. See also NOTES. > > You can quote as much man page as you want - Andi is well aware of how > read/write system call works, as I'm sure all of us are, that is not the > issue. The issue is that there are potentially LOTS of applications out > there that do not check for short writes, they do exactly what Andi > speculated above. If you break it with this change, it doesn't matter > what's in the man page. What matters is previous behavior, and that > you are breaking user space. At that point nobody cares what's in the > man page. > Agree. So how do you think we should fix this to accommodate userspace application who did not cater to the fact that write can return short write, and still be consistent? The only way I can think is that a DIO write should check early enough that the write(N) will complete with N bytes without an error. Is it possible to completely guarantee that? Leaving it as it is incorrect as quoted in the artificial test case. You should not be changing the file and yet conveying to the user an error for the same write() call. It should either be an error and the file contents are unchanged, or it should be change in contents and the write size returned. -- Goldwyn