Some recent discussion on LMKL [0] brought up some interesting and useful additional context on RCU-freeing for pagetables. Note down some extra info in here, in particular a) be concrete about the reason why an arch might not have an IPI and b) add the interesting paravirt details. [0] https://lore.kernel.org/linux-kernel/20250206044346.3810242-2-riel@xxxxxxxxxxx/ --- Note the Lore link in here is referring to the base of the thread. The mail I wanted to actually refer to is not yet on Lore as it's not currently updating. Here's what I have in my mailbox: On Tue, 11 Feb 2025 at 12:07, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > It would be nice to update the CONFIG_MMU_GATHER_RCU_TABLE_FREE > > comment in mm/mmu_gather.c to mention INVLPG alongside "Architectures > > that do not have this (PPC)" > > Why? This is just one more architecture that does broadcast. Hmm yeah, I didn't really make the leap from "doesn't do an IPI" to "that just means it uses broadcast TLB flush". In that light I can see how it's "just another architecture". I do think it would make sense to be more explicit about that, even though it seems obvious now you point it out. But it's not really relevant to this patchset. > - and while that's being updated it would > > also be useful to note down the paravirt thing you explained above, > > IMO it's pretty helpful to have more examples of the concrete usecases > > for this logic. > > Look at kvm_flush_tlb_multi() if you're interested. The notable detail > is that is avoids flushing TLB for vCPUs that are preempted, and instead > lets the vCPU resume code do the invalidate. Oh that actually looks like a slightly different case from what Rik mentioned? > some paravirt TLB flush implementations > handle the TLB flush in the hypervisor, and will do the flush > even when the target CPU has interrupts disabled. Do we have a) Systems where the flush gets completely pushed into the hypervisor - xen_flush_tlb_multi()? b) Systems where the guest coordinates with the hypervisor to avoid IPI-ing preempted vCPUs? Maybe I can send a separate patch to note this in the commentary, it's interesting and useful to know. Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx> --- mm/mmu_gather.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 7aa6f18c500b2d292621ec308f575ed4ddbdcd3e..db7ba4a725d6ad445eb7f35f0b34e0d4364eb693 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -246,8 +246,16 @@ static void __tlb_remove_table_free(struct mmu_table_batch *batch) * IRQs delays the completion of the TLB flush we can never observe an already * freed page. * - * Architectures that do not have this (PPC) need to delay the freeing by some - * other means, this is that means. + * Not all systems IPI every CPU for this purpose: + * + * - Some architectures have HW support for cross-CPU synchronisation of TLB + * flushes, so there's no IPI at all. + * + * - Paravirt guests can do this TLB flushing in the hypervisor, or coordinate + * with the hypervisor to defer flushing on preempted vCPUs. + * + * Such systems need to delay the freeing by some other means, this is that + * means. * * What we do is batch the freed directory pages (tables) and RCU free them. * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling --- base-commit: 266a5a879d40554630c7e485cb5576227759c7a0 change-id: 20250211-mmugather-comment-3ca5f41805ec Best regards, -- Brendan Jackman <jackmanb@xxxxxxxxxx>