On 7/7/23 1:10?PM, Linus Torvalds wrote: > On Fri, 7 Jul 2023 at 10:21, Christian Brauner <brauner@xxxxxxxxxx> wrote: >> >> Forgot to say, fwiw, I've been running this through the LTP splice, >> pipe, and ipc tests without issues. A hanging reader can be signaled >> away cleanly with this. > > So that patch still has a couple of "wait for this" cases remaining. > > In particular, when we do a read, and we do have pipe buffers, both > the read() system call and a number of internal splice functions will > go "Ahh, I have data", and then do pipe_buf_confirm() and read it. > > Which then results in pipe_buf_confirm() blocking. It now blocks > interruptibly, which is much nicer, but several of these users *could* > just do a non-blocking confirmation instead, and wait for pipe > readability. > > HOWEVER, that's slightly less trivial than you'd expect, because the > "wait for readability" needs to be done without the pipe lock held - > so you can't actually check the pipe buffer state at that point (since > you need the pipe lock to look up the buffer). > > That's true even of "trivial" cases like actual user-space "read() > with O_NONBLOCK and poll()" situations. > > Now, the solution to all this is *fairly* straightforward: > > (a) don't use "!pipe_empty()" for a readability check. > > We already have "pipe_readable()", but it's hidden in fs/pipe.c, > so all the splice() code ended up writing the "does this pipe have > data" using "!pipe_empty()" instead. > > (b) make "pipe_buf_confirm()" take a "non-blocking" boolean argument, > and if it is non-blocking but hits one of those blocked pages, set > "pipe->not_ready", and return -EAGAIN. > > This is ok, because "pipe_buf_confirm()" is always under the pipe > lock, and we'll just clear "pipe->not_ready" under the pipe lock after > finalizing all those pages (and before waking up readers) > > (c) make "pipe_wait_readable()" and "poll()" know about this all, so > that we wait properly for a pipe that was not ready to become ready > > This all makes *most* users deal properly with these blocking events. > In particular, things like splice_to_socket() can now do the whole > proper "wait without holding the pipe lock" sequence, even when the > pipe is not empty, just in this blocked state. > > This *may* also make all the cases Jens had with io_uring and splicing > JustWork(tm). Exactly! I was reading this thread with excitement just now, would be nice to get rid of that kludge. > NOTE! NOTE! NOTE! Once more, this "feels right to me", and I'd argue > that the basic approach is fairly straightfoward. The patch is also > not horrendous. It all makes a fair amount of sense. BUT! I haven't > tested this, and like the previous patch, I really would want people > to think about this a lot. > > Comments? Jens? I'll take a closer look at this, but won't be until Monday most likely. But the approach seems sane, and going in a more idiomatic direction than before. So seems promising. -- Jens Axboe