Re: Mutex free Pipes

Mateusz Guzik <mjguzik@xxxxxxxxx> · Wed, 22 Jan 2025 08:04:47 +0100

On Wed, Jan 22, 2025 at 3:27 AM Nick Renner <nr2185@xxxxxxx> wrote:
>
> On Tue, Jan 21, 2025 at 3:34 AM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> > Most of the time when something was done it is either because it's a bad
> > idea or (more likely) nobody sat down to do it.
> Ok thanks for this. Realistically seeing if theres a real answer here
> was the main hope of my post, but I'm assuming you're probably correct
> here.
>

Whether things work out or not I definitely encourage you to keep digging.

Worst case scenario you had some fun, best case you made pipes on Linux faster.

> My understanding of kfifo is that read and write operations have to be
> locked at their respective end when there is more than one reader or
> writer, but the data structure doesn't have to be globally locked.
> This could be accomplished with spinlocks for both ends similar to use
> in kfifo_in_spinlocked etc. Pipes already keep track of the number of
> readers or writers, and could use the non-spinlocked path in SPSC
> situations. My assumption is that a majority of pipe usage is in SPSC
> situations so allowing for this much faster version seems desirable.
>

I was under the mistaken assumption you were looking to make this
fully lockless (because of "mutex free") and then I made a poor job
commenting.

So let me try to straighten this out.

I can confirm:
1. you can make a fifo work with respective locks for readers and writers.
2. vast majority of the time there is at most one thread writing and
one thread reading.

On top of that I can also confirm that contention on the mutex is a
significant part of the overhead.

Thus patching it up is most welcome.

So where is the "but"?

First a short remark that apart from what happens when there is both a
reader and a writer active at the same time, in the current code there
is a significant performance loss from single-threaded standpoint. I
poked around some months back, there are things which can be done to
reduce the total overhead and lock hold time (and thus contention)
while retaining the current scheme. The crux being whatever the rework
it must not be slower single-threaded, and it preferably be faster.

Now to the point: in contrast to a regular fifo, the area backing the
total capacity of the pipe is not necessarily fully allocated -- it is
being backed by page-sized buffers allocated and freed as needed.

I think the default capacity is 64K. Only ever using what's needed
(rounded up to a multiple of page size) saves memory for most pipes
and I presume this property will have to stay.

Sample problem this poses for the lockless approach: after finishing
up a buffer, the reader will need to free it, but this needs to be
synchronized against a writer which may just happen to have started
writing there.

It may be there is a trivial solution which does not partially defeat
the point and slow down single-threaded operation, but as is I don't
see it (but then again I did not seriously dig into it).

This is not the entire story (and I'm not an expert on Linux pipes),
but hopefully showcases the gist of the problem as I see it.

That said, per what I mentioned above, drilling here is warranted for
sure, I'm just not at all confident the proposed approach will do the
trick.

Good luck & have fun :)
-- 
Mateusz Guzik <mjguzik gmail.com>