On 2022-01-26 16:20:36 [+0100], Michal Hocko wrote: > I do not see any obvious problem with this patch. The code is ugly as > hell, though, but a large part of that is because of the weird locking > scheme we already have. I've had a look at 559271146efc ("mm/memcg: > optimize user context object stock access") and while I agree that it > makes sense to optimize for user context I do not really see any numbers > justifying the awkward locking scheme. Is this complexity really worth > it? >From https://https://lkml.kernel.org/r/.kernel.org/all/YdX+INO9gQje6d0S@xxxxxxxxxxxxx/: | Sandy Bridge Haswell Skylake AMD-A8 7100 Zen2 ARM64 |PREEMPT 5,123,896,822 5,215,055,226 5,077,611,590 6,012,287,874 6,234,674,489 20,000,000,100 |IRQ 7,494,119,638 6,810,367,629 10,620,130,377 4,178,546,086 4,898,076,012 13,538,461,925 basically if PREEMPT < IRQ then preempt_disable() + enable() was cheaper than local_irq_save() + restore(). | Sandy Bridge | SERVER OPT SERVER NO-OPT PREEMPT OPT PREEMPT NO-OPT | ALLOC/FREE 8,519,295,176 9,051,200,652 10,627,431,395 11,198,189,843 | SD 5,309,768 29,253,976 129,102,317 40,681,909 | ALLOC/FREE BH 9,996,704,330 8,927,026,031 11,680,149,900 11,139,356,465 | SD 38,237,534 72,913,120 23,626,932 116,413,331 OPT is code as-is while "NO-OPT" is with the following patch which disables the optimisation (so it should be a revert of the optimisation commit). ALLOC/FREE is kfree(kmalloc()). ALLOC/FREE BH is the same but in_interrupt() reported true. The numbers are are time needed in ns for 100,000,000 iterations of the free+alloc. SD is standard deviation. I also let the test run on a Zen2 box: | SERVER OPT SERVER NO-OPT PREEMPT OPT PREEMPT NO-OPT | ALLOC/FREE 8,126,735,313 8,751,307,383 9,822,927,142 10,045,105,425 | SD 100,806,471 87,234,047 55,170,179 25,832,386 | ALLOC/FREE BH 9,197,455,885 8,394,337,053 10,671,227,095 9,904,954,934 | SD 155,223,919 57,800,997 47,529,496 105,260,566 Is this what you asked for? Sebastian