On Thu 27-01-22 12:53:40, Sebastian Andrzej Siewior wrote: > On 2022-01-26 16:20:36 [+0100], Michal Hocko wrote: > > I do not see any obvious problem with this patch. The code is ugly as > > hell, though, but a large part of that is because of the weird locking > > scheme we already have. I've had a look at 559271146efc ("mm/memcg: > > optimize user context object stock access") and while I agree that it > > makes sense to optimize for user context I do not really see any numbers > > justifying the awkward locking scheme. Is this complexity really worth > > it? > > >From https://https://lkml.kernel.org/r/.kernel.org/all/YdX+INO9gQje6d0S@xxxxxxxxxxxxx/: > > | Sandy Bridge Haswell Skylake AMD-A8 7100 Zen2 ARM64 > |PREEMPT 5,123,896,822 5,215,055,226 5,077,611,590 6,012,287,874 6,234,674,489 20,000,000,100 > |IRQ 7,494,119,638 6,810,367,629 10,620,130,377 4,178,546,086 4,898,076,012 13,538,461,925 > > basically if PREEMPT < IRQ then preempt_disable() + enable() was cheaper > than local_irq_save() + restore(). > > | Sandy Bridge > | SERVER OPT SERVER NO-OPT PREEMPT OPT PREEMPT NO-OPT > | ALLOC/FREE 8,519,295,176 9,051,200,652 10,627,431,395 11,198,189,843 > | SD 5,309,768 29,253,976 129,102,317 40,681,909 > | ALLOC/FREE BH 9,996,704,330 8,927,026,031 11,680,149,900 11,139,356,465 > | SD 38,237,534 72,913,120 23,626,932 116,413,331 > > OPT is code as-is while "NO-OPT" is with the following patch which > disables the optimisation (so it should be a revert of the optimisation > commit). > > ALLOC/FREE is kfree(kmalloc()). > ALLOC/FREE BH is the same but in_interrupt() reported true. > The numbers are are time needed in ns for 100,000,000 iterations of the > free+alloc. SD is standard deviation. > I also let the test run on a Zen2 box: > > | SERVER OPT SERVER NO-OPT PREEMPT OPT PREEMPT NO-OPT > | ALLOC/FREE 8,126,735,313 8,751,307,383 9,822,927,142 10,045,105,425 > | SD 100,806,471 87,234,047 55,170,179 25,832,386 > | ALLOC/FREE BH 9,197,455,885 8,394,337,053 10,671,227,095 9,904,954,934 > | SD 155,223,919 57,800,997 47,529,496 105,260,566 > > Is this what you asked for? Thanks! This gives us some picture from the microbenchmark POV. I was more interested in some real life representative benchmarks. In other words does the optimization from Weiman make any visible difference for any real life workload? Sorry, I know that this all is not really related to your work but if the original optimization is solely based on artificial benchmarks then I would rather drop it and also make your RT patchset easier. -- Michal Hocko SUSE Labs