> On Aug 23, 2021, at 1:05 AM, Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Hi, Nadav, > > Nadav Amit <nadav.amit@xxxxxxxxx> writes: > >> From: Nadav Amit <namit@xxxxxxxxxx> >> >> flush_tlb_batched_pending() appears to have a theoretical race: >> tlb_flush_batched is being cleared after the TLB flush, and if in >> between another core calls set_tlb_ubc_flush_pending() and sets the >> pending TLB flush indication, this indication might be lost. Holding the >> page-table lock when SPLIT_LOCK is set cannot eliminate this race. > > Recently, when I read the corresponding code, I find the exact same race > too. Do you still think the race is possible at least in theory? If > so, why hasn't your fix been merged? I think the race is possible. It didn’t get merged, IIRC, due to some addressable criticism and lack of enthusiasm from other people, and my laziness/busy-ness. > >> The current batched TLB invalidation scheme therefore does not seem >> viable or easily repairable. > > I have some idea to fix this without too much code. If necessary, I > will send it out. Arguably, it would be preferable to have a small back-portable fix for this issue specifically. Just try to ensure that you do not introduce performance overheads. Any solution should be clear about its impact on additional TLB flushes on the worst-case scenario and the number of additional atomic operations that would be required.