On Fri, Jan 03, 2025 at 01:18:43PM +0100, Borislav Petkov wrote: > On Thu, Jan 02, 2025 at 08:56:09PM +0100, Peter Zijlstra wrote: > > Well, I've already answered why we need this in the previous thread but > > it wasn't preserved :-( > > ... and this needs to be part of the commit message. And there's a similar > comment over tlb_remove_table_smp_sync() in mm/mmu_gather.c which pretty much > explains the same thing. > > > Currently GUP-fast serializes against table-free by disabling > > interrupts, which in turn holds of the TLBI-IPIs. > > > > Since you're going to be doing broadcast TLBI -- without IPIs, this no > > longer works and we need other means of serializing GUP-fast vs > > table-free. > > > > MMU_GATHER_RCU_TABLE_FREE is that means. > > > > So where previously paravirt implementations of tlb_flush_multi might > > require this (because of virt optimizations that avoided the TLBI-IPI), > > this broadcast invalidate now very much requires this for native. > > Right, so this begs the question: we probably should do this dynamically only > on TLBI systems - not on everything native - due to the overhead of this > batching - I'm looking at tlb_remove_table(). > > Or should we make this unconditional on all native because we don't care about > the overhead and would like to have simpler code. I mean, disabling IRQs vs > batching and allocating memory...? The disabling IRQs on the GUP-fast side stays, it acts as a RCU-read-side section -- also mmu_gather reverts to sending IPIs if it runs out of memory (extremely rare). I don't think there is measurable overhead from doing the separate table batching, but I'm sure the robots will tell us.