Re: io_uring_prep_openat_direct() and link/drain

Miklos Szeredi <miklos@xxxxxxxxxx> · Thu, 21 Apr 2022 14:39:07 +0200

On Thu, 21 Apr 2022 at 14:34, Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 4/21/22 6:31 AM, Miklos Szeredi wrote:
> > On Tue, 5 Apr 2022 at 16:44, Jens Axboe <axboe@xxxxxxxxx> wrote:
> >>
> >> On 4/5/22 1:45 AM, Miklos Szeredi wrote:
> >>> On Sat, 2 Apr 2022 at 03:17, Jens Axboe <axboe@xxxxxxxxx> wrote:
> >>>>
> >>>> On 4/1/22 10:21 AM, Jens Axboe wrote:
> >>>>> On 4/1/22 10:02 AM, Miklos Szeredi wrote:
> >>>>>> On Fri, 1 Apr 2022 at 17:36, Jens Axboe <axboe@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>>> I take it you're continually reusing those slots?
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>>>  If you have a test
> >>>>>>> case that'd be ideal. Agree that it sounds like we just need an
> >>>>>>> appropriate breather to allow fput/task_work to run. Or it could be the
> >>>>>>> deferral free of the fixed slot.
> >>>>>>
> >>>>>> Adding a breather could make the worst case latency be large.  I think
> >>>>>> doing the fput synchronously would be better in general.
> >>>>>
> >>>>> fput() isn't sync, it'll just offload to task_work. There are some
> >>>>> dependencies there that would need to be checked. But we'll find a way
> >>>>> to deal with it.
> >>>>>
> >>>>>> I test this on an VM with 8G of memory and run the following:
> >>>>>>
> >>>>>> ./forkbomb 14 &
> >>>>>> # wait till 16k processes are forked
> >>>>>> for i in `seq 1 100`; do ./procreads u; done
> >>>>>>
> >>>>>> You can compare performance with plain reads (./procreads p), the
> >>>>>> other tests don't work on public kernels.
> >>>>>
> >>>>> OK, I'll check up on this, but probably won't have time to do so before
> >>>>> early next week.
> >>>>
> >>>> Can you try with this patch? It's not complete yet, there's actually a
> >>>> bunch of things we can do to improve the direct descriptor case. But
> >>>> this one is easy enough to pull off, and I think it'll fix your OOM
> >>>> case. Not a proposed patch, but it'll prove the theory.
> >>>
> >>> Sorry for the delay..
> >>>
> >>> Patch works like charm.
> >>
> >> OK good, then it is the issue I suspected. Thanks for testing!
> >
> > Tested with v5.18-rc3 and performance seems significantly worse than
> > with the test patch:
> >
> > test patch:
> >         avg     min     max     stdev
> > real    0.205   0.190   0.266   0.011
> > user    0.017   0.007   0.029   0.004
> > sys     0.374   0.336   0.503   0.022
> >
> > 5.18.0-rc3-00016-gb253435746d9:
> >         avg     min     max     stdev
> > real    0.725   0.200   18.090  2.279
> > user    0.019   0.005   0.046   0.006
> > sys     0.454   0.241   1.022   0.199
>
> It's been a month and I don't remember details of which patches were
> tested, when you say "test patch", which one exactly are you referring
> to and what base was it applied on?

https://lore.kernel.org/all/47912c4c-ccc2-0678-6c2f-3e3c0dd1f04b@xxxxxxxxx/

The base is a good question, it was after the basic fixed slot
assignment issues were fixed.

Thanks,
Miklos