Re: [linux-next:master] [pipe_read] aaec5a95d5: stress-ng.poll.ops_per_sec 11.1% regression

Oleg Nesterov <oleg@xxxxxxxxxx> · Mon, 20 Jan 2025 21:31:19 +0100

Mateusz,

I'm afraid my emails can look as if I am trying to deny the problem.
No. Just I think we need to understand why exactly this patch makes
a difference.

On 01/20, Mateusz Guzik wrote:
>
> While I'm too tired to dig into the code at the momen,

Me too.

> I checked how often the sucker goess off cpu, like so: bpftrace -e
> 'kprobe:schedule { @[kstack()] = count(); }'
>
> With your patch I reliably get about 38 mln calls from pipe_read.
> Without your patch this drops to about 17 mln, as in less than half.

Heh ;) I don't use bpftrace, but with the help of printk() I too noticed
the difference (although not that big) when I tried to understand the 1st
report https://lore.kernel.org/all/202501101015.90874b3a-lkp@xxxxxxxxx/

Not that I really understand this difference, but I am not really surpised.
With this patch the writers have more CPU (due to unnecessary wakeups).

What really surprises me is that (with or without this patch) the readers
call wait_event/schedule MUUUUUUUUUUUUCH more than the writers.

I guess this is because sender() and receiver() are not "symmetric",
sender() writes to the "random" fd, while receiver() always reads from
the same ctx->in_fds[0]... Still not clear to me.

And I don't understand what workload this logic tries to simulate, but
this doesn't matter.

Oleg.