On Wed, Sep 20, 2023 at 7:28 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > I think adding the flag for this case makes sense, and also exposing it > on the UAPI side. OK. I suggest we get this patch merged first, and then I prepare a patch for wiring this into uapi, changing SPLICE_F_NOWAIT to 0x10 (the lowest free bit), add it to SPLICE_F_ALL and document it. (If you prefer to have it all in this initial patch, I can amend and resubmit it with the uapi feature.) > My only concern is full coverage of it. We can't > really have a SPLICE_F_NOWAIT flag that only applies to some cases. The feature is already part of uapi - via RWF_NOWAIT, which maps to IOCB_NOWAIT, just like my proposed SPLICE_F_NOWAIT flag. The semantics (and the concerns) are the same, aren't they? > That said, asking for a 2G splice, and getting a 2G splice no matter how > slow it may be, is a bit of a "doctor it hurts when I..." scenario. I understand this argument, but I disagree. Compare recv(socket) with read(regular_file). A read(regular_file) must block until the given buffer is filled completely (or EOF is reached), which is good for some programs which do not handle partial reads, but other programs might be happy with a partial read and prefer lower latency. There is preadv2(RWF_NOWAIT), but if it returns EAGAIN, userspace cannot know when data will be available, can't epoll() regular files. There's no way that a read() returns at least one byte, but doesn't wait for more (not even with preadv2(), unfortunately). recv(socket) (or reading on a pipe) behaves differently - it blocks only until at least one byte arrives, and callers must be able to deal with partial reads. That's good for latency - imagine recv() would behave like read(); how much data do you ask the kernel to receive? If it's too little, you need many system calls; if it's too much, your process may block indefinitely. read(regular_file) behaves that way for historical reasons and we can't change it, only add new APIs like preadv2(); but splice() is a modern API that we can optimize for how we want it to behave - and that is: copy as much as the kernel already has, but don't block after that (in order to avoid huge latencies). My point is: splice(2G) is a very reasonable thing to do if userspace wants the kernel to transfer as much as possible with a single system call, because there's no way for userspace to know what the best number is, so let's just pass the largest valid value. Max