On Mon, Nov 1, 2021 at 8:44 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Nov 1, 2021 at 1:37 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Fri 29-10-21 09:07:39, Suren Baghdasaryan wrote: > > > On Fri, Oct 29, 2021 at 6:03 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > [...] > > > > Well, I still do not see why that is a problem. This syscall is meant to > > > > release the address space not to do it fast. > > > > > > It's the same problem for a userspace memory reaper as for the > > > oom-reaper. The goal is to release the memory of the victim and to > > > quickly move on to the next one if needed. > > > > The purpose of the oom_reaper is to _guarantee_ a forward progress. It > > doesn't have to be quick or optimized for speed. > > Fair enough. Then the same guarantees should apply to userspace memory > reapers. I think you clarified that well in your replies in > https://lore.kernel.org/all/20170725154514.GN26723@xxxxxxxxxxxxxx: > > Because there is no _guarantee_ that the final __mmput will release > the memory in finite time. And we cannot guarantee that longterm. > ... > __mmput calls into exit_aio and that can wait for completion and there > is no way to guarantee this will finish in finite time. > > > > > [...] > > > > > > Btw. the above code will not really tell you much on a larger machine > > > > unless you manage to trigger mmap_sem contection. Otherwise you are > > > > measuring the mmap_sem writelock fast path and that should be really > > > > within a noise comparing to the whole address space destruction time. If > > > > that is not the case then we have a real problem with the locking... > > > > > > My understanding of that discussion is that the concern was that even > > > taking uncontended mmap_sem writelock would regress the exit path. > > > That was what I wanted to confirm. Am I misreading it? > > > > No, your reading match my recollection. I just think that code > > robustness in exchange of a rw semaphore write lock fast path is a > > reasonable price to pay even if that has some effect on micro > > benchmarks. > > I'm with you on this one, that's why I wanted to measure the price we > would pay. Below are the test results: > > Test: https://lore.kernel.org/all/20170725142626.GJ26723@xxxxxxxxxxxxxx/ > Compiled: gcc -O2 -static test.c -o test > Test machine: 128 core / 256 thread 2x AMD EPYC 7B12 64-Core Processor > (family 17h) > > baseline (Linus master, f31531e55495ca3746fb895ffdf73586be8259fa) > p50 (median) 87412 > p95 168210 > p99 190058 > average 97843.8 > stdev 29.85% > > unconditional mmap_write_lock in exit_mmap (last column is the change > from the baseline) > p50 (median) 88312 +1.03% > p95 170797 +1.54% > p99 191813 +0.92% > average 97659.5 -0.19% > stdev 32.41% > > unconditional mmap_write_lock in exit_mmap + Matthew's patch (last > column is the change from the baseline) > p50 (median) 88807 +1.60% > p95 167783 -0.25% > p99 187853 -1.16% > average 97491.4 -0.36% > stdev 30.61% > > stdev is quite high in all cases, so the test is very noisy. Need to clarify that what I called here "stdev" is actually stdev / average in %. > The impact seems quite low IMHO. WDYT? > > > -- > > Michal Hocko > > SUSE Labs