On Wed, Feb 26, 2025 at 2:19 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > On Mon, Feb 24, 2025 at 03:24:32PM +0100, Oleg Nesterov wrote: > > On 02/24, Sapkal, Swapnil wrote: > > > Whenever I compare the case where was_full would have been set but > > > wake_writer was not set, I see the following pattern: > > > > > > ret = 100 (Read was successful) > > > pipe_full() = 1 > > > total_len = 0 > > > buf->len != 0 > > > > > > total_len is computed using iov_iter_count() while the buf->len is the > > > length of the buffer corresponding to tail(pipe->bufs[tail & mask].len). > > > Looking at pipe_write(), there seems to be a case where the writer can make > > > progress when (chars && !was_empty) which only looks at iov_iter_count(). > > > Could it be the case that there is still room in the buffer but we are not > > > waking up the writer? > > > > I don't think so, but perhaps I am totally confused. > > > > If the writer sleeps on pipe->wr_wait, it has already tried to write into > > the pipe->bufs[head - 1] buffer before the sleep. > > > > Yes, the reader can read from that buffer, but this won't make it more "writable" > > for this particular writer, "PAGE_SIZE - buf->offset + buf->len" won't be changed. > > While I think the now-removed wakeup was indeed hiding a bug, I also > think the write thing pointed out above is a fair point (orthogonal > though). > > The initial call to pipe_write allows for appending to an existing page. > > However, should the pipe be full, the loop which follows it insists on > allocating a new one and waits for a slot, even if ultimately *there is* > space now. > > The hackbench invocation used here passes around 100 bytes. > > Both readers and writers do rounds over pipes issuing 100 byte-sized > ops. > > Suppose the pipe does not have space to hold the extra 100 bytes. The > writer goes to sleep and waits for the tail to move. A reader shows up, > reads 100 bytes (now there is space!) but since the current buf was not > depleted it does not mess with the tail. > > The bench spawns tons of threads, ensuring there is a lot of competition > for the cpu time. The reader might get just enough time to largely > deplete the pipe to a point where there is only one buf in there with > space in it. Should pipe_write() be invoked now it would succeed > appending to a page. But if the writer was already asleep, it is going > to insist on allocating a new page. Now that I sent the e-mail, I realized the page would have unread data after some offset, so there is no room to *append* to it, unless one wants to memmove everythiing back. Please ignore this bit :P However, the suggestion below stands: > > As for the bug, I don't see anything obvious myself. > > However, I think there are 2 avenues which warrant checking. > > Sapkal, if you have time, can you please boot up the kernel which is > more likely to run into the problem and then run hackbench as follows: > > 1. with 1 fd instead of 20: > > /usr/bin/hackbench -g 16 -f 1 --threads --pipe -l 100000 -s 100 > > 2. with a size which divides 4096 evenly (e.g., 128): > > /usr/bin/hackbench -g 1 -f 20 --threads --pipe -l 100000 -s 128 -- Mateusz Guzik <mjguzik gmail.com>