On Mon, Jan 17, 2022 at 12:19:29PM +0200, Linus Torvalds wrote: > On Mon, Jan 17, 2022 at 11:57 AM David Howells <dhowells@xxxxxxxxxx> wrote: > > > > Do you have an opinion on whether it's permissible for a filesystem to write > > into the read() buffer beyond the amount it claims to return, though still > > within the specified size of the buffer? > > I'm pretty sure that would seriously violate POSIX in the general > case, and maybe even break some programs that do fancy buffer > management (ie I could imagine some circular buffer thing that expects > any "unwritten" ('unread'?) parts to stay with the old contents) > > That said, that's for generic 'read()' cases for things like tty's or > pipes etc that can return partial reads in the first place. > > If it's a regular file, then any partial read *already* violates > POSIX, and nobody sane would do any such buffer management because > it's supposed to be a 'can't happen' thing. > > And since you mention DIO, that's doubly true, and is already outside > basic POSIX, and has already violated things like "all or nothing" > rules for visibility of writes-vs-reads (which admittedly most Linux > filesystems have violated even outside of DIO, since the strictest > reading of the rules are incredibly nasty anyway). But filesystems > like XFS which took some of the strict rules more seriously already > ignored them for DIO, afaik. I think for DIO, you're sacrificing the entire buffer with any filesystem. If the underlying file is split across multiple drives, or is even just fragmented on a single drive, we'll submit multiple BIOs which will complete independently (even for SCSI which writes sequentially; never mind NVMe which can DMA blocks asynchronously). It might be more apparent in a networking situation where errors are more common, but it's always been a possibility since Linux introduced DIO.