On Tue 04-06-24 13:14:05, Max Kellermann wrote: > On Tue, Jun 4, 2024 at 12:41 PM Jan Kara <jack@xxxxxxx> wrote: > > Well, I can see your pain but after all the kernel does exactly what > > userspace has asked for? > > That is a valid point of view; indeed the kernel's behavior is correct > according to the specification, but that was not my point. > > This is about an exotic problem that occurs only in very rare > circumstances (depending on hard disk speed, network speed and > timing), but when it occurs, it blocks the calling process for a very > long time, which can then cause problems more serious than user > unhappiness (e.g. expiring timeouts). (As I said, nginx had to work > around this problem.) > > I'd like to optimize this special case, and adjust the kernel to > always behave like the common case. > > > After all there's no substantial difference between userspace issuing a 2GB read(2) and 2GB sendfile(2). > > I understand your fear of breaking userspace, but this doesn't apply > here, because yes, there is indeed a substantial difference: in the > normal case, sendfile() stops when the destination socket buffer is > full. That is the normal mode of operation, which all applications > must be prepared for, because short sendfile() calls happen all the > time, that's the common case. > > My patch is ONLY about fixing that exotic special case where the > socket buffer is drained over and over while sendfile() still runs. OK, so that was not clear to me (and this may well be just my ignorance of networking details). Do you say that your patch changes the behavior only for this cornercase? Even if the socket fd is blocking? AFAIU with your patch we'd return short write in that case as well (roughly 64k AFAICT because that's the amount the internal splice pipe will take) but currently we block waiting for more space in the socket bufs? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR