On 4/21/22 6:39 AM, Miklos Szeredi wrote: > On Thu, 21 Apr 2022 at 14:34, Jens Axboe <axboe@xxxxxxxxx> wrote: >> >> On 4/21/22 6:31 AM, Miklos Szeredi wrote: >>> On Tue, 5 Apr 2022 at 16:44, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> >>>> On 4/5/22 1:45 AM, Miklos Szeredi wrote: >>>>> On Sat, 2 Apr 2022 at 03:17, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>> >>>>>> On 4/1/22 10:21 AM, Jens Axboe wrote: >>>>>>> On 4/1/22 10:02 AM, Miklos Szeredi wrote: >>>>>>>> On Fri, 1 Apr 2022 at 17:36, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>>> I take it you're continually reusing those slots? >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>>> If you have a test >>>>>>>>> case that'd be ideal. Agree that it sounds like we just need an >>>>>>>>> appropriate breather to allow fput/task_work to run. Or it could be the >>>>>>>>> deferral free of the fixed slot. >>>>>>>> >>>>>>>> Adding a breather could make the worst case latency be large. I think >>>>>>>> doing the fput synchronously would be better in general. >>>>>>> >>>>>>> fput() isn't sync, it'll just offload to task_work. There are some >>>>>>> dependencies there that would need to be checked. But we'll find a way >>>>>>> to deal with it. >>>>>>> >>>>>>>> I test this on an VM with 8G of memory and run the following: >>>>>>>> >>>>>>>> ./forkbomb 14 & >>>>>>>> # wait till 16k processes are forked >>>>>>>> for i in `seq 1 100`; do ./procreads u; done >>>>>>>> >>>>>>>> You can compare performance with plain reads (./procreads p), the >>>>>>>> other tests don't work on public kernels. >>>>>>> >>>>>>> OK, I'll check up on this, but probably won't have time to do so before >>>>>>> early next week. >>>>>> >>>>>> Can you try with this patch? It's not complete yet, there's actually a >>>>>> bunch of things we can do to improve the direct descriptor case. But >>>>>> this one is easy enough to pull off, and I think it'll fix your OOM >>>>>> case. Not a proposed patch, but it'll prove the theory. >>>>> >>>>> Sorry for the delay.. >>>>> >>>>> Patch works like charm. >>>> >>>> OK good, then it is the issue I suspected. Thanks for testing! >>> >>> Tested with v5.18-rc3 and performance seems significantly worse than >>> with the test patch: >>> >>> test patch: >>> avg min max stdev >>> real 0.205 0.190 0.266 0.011 >>> user 0.017 0.007 0.029 0.004 >>> sys 0.374 0.336 0.503 0.022 >>> >>> 5.18.0-rc3-00016-gb253435746d9: >>> avg min max stdev >>> real 0.725 0.200 18.090 2.279 >>> user 0.019 0.005 0.046 0.006 >>> sys 0.454 0.241 1.022 0.199 >> >> It's been a month and I don't remember details of which patches were >> tested, when you say "test patch", which one exactly are you referring >> to and what base was it applied on? > > https://lore.kernel.org/all/47912c4c-ccc2-0678-6c2f-3e3c0dd1f04b@xxxxxxxxx/ > > The base is a good question, it was after the basic fixed slot > assignment issues were fixed. Gotcha, ok then this makes sense. The ordering issues were sorted out for 5.18-rc3, but the direct descriptor optimization is only in the 5.19 branch. -- Jens Axboe