On Jul 8, 2022, at 11:54 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > ⚠ External Email > > On 7/8/22 10:04, Nadav Amit wrote: >> On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: >>> On 7/7/22 17:30, Nadav Amit wrote: >>> You might want to fix the clock on the system from which you sent this. >>> I was really scratching my head trying to figure out how you got this >>> patch out before Hugh's bug report. >>> >>>> From: Nadav Amit <namit@xxxxxxxxxx> >>>> >>>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >>>> possible") introduced an optimization of skipping the flush if the TLB >>>> generation that is flushed (as provided in flush_tlb_info) was already >>>> flushed. >>>> >>>> However, arch_tlbbatch_flush() does not provide any generation in >>>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >>>> TLB flushes. >>>> >>>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >>>> anyhow is an invalid generation value. >>> >>> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker >>> for f->new_tlb_gen being invalid. Being consistent seems like a good >>> idea on this stuff. >> >> If we get a request to do a flush, regardless whether full or partial, >> that logically we have already done, there is not reason to do it. >> >> I therefore do not see a reason to look on f->end. I think that looking >> at the generation is very intuitive. If you want, I can add a constant >> such as TLB_GENERATION_INVALID. > > That's a good point. > > But, _my_ point was that there was only really one read site of > f->new_tlb_gen in flush_tlb_func(). That site is guarded by the "f->end > != TLB_FLUSH_ALL" check which prevented it from making the same error > that your patch did. > > Whatever we do, it would be nice to have a *single* way to check for > "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?" > > Using something like TLB_GENERATION_INVALID seems reasonable to me. I am not sure that I fully understood what you meant. I understand you do want TLB_GENERATION_INVALID, and I think you ask for some assertions to regard to the expected relationship between TLB_GENERATION_INVALID and TLB_FLUSH_ALL. I think that in order to shorten the discussion, I’ll send v2 (very soon) and you will comment on the patch itself. I did run the tests again to measure performance as you asked for, and the results are similar (will-it-scale tlb_flush1/threads): 11% speedup with 45 cores. Without aa44284960d5: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,156782,97.89,157024,97.92,157024 5,707879,89.60,322015,89.69,785120 10,1311968,79.21,490881,79.68,1570240 15,1498762,68.82,553958,69.71,2355360 20,1483057,58.45,598939,60.00,3140480 25,1428105,48.07,626179,50.46,3925600 30,1648992,37.77,643954,41.36,4710720 35,1861301,27.50,671570,32.63,5495840 40,2038278,17.17,669470,24.03,6280960 45,2140412,6.71,673633,15.27,7066080 With aa44284960d5 + pending patch: 0,0,100,0,100,0 1,157935,97.93,155351,97.93,157935 5,710450,89.60,324981,89.70,789675 10,1291935,79.21,498496,79.57,1579350 15,1515323,68.81,575821,69.68,2369025 20,1545172,58.46,625521,60.05,3158700 25,1501015,48.09,675856,50.62,3948375 30,1682308,37.80,705242,41.55,4738050 35,1850464,27.52,717724,32.33,5527725 40,2030726,17.17,734610,23.99,6317400 45,2136401,6.71,747257,16.07,7107075