On 20/01/2025 16:02, Rik van Riel wrote:
On Mon, 2025-01-20 at 11:56 +0200, Nadav Amit wrote:
@@ -1670,12 +1668,62 @@ void arch_tlbbatch_flush(struct
arch_tlbflush_unmap_batch *batch)
local_irq_enable();
}
+ /*
+ * If we issued (asynchronous) INVLPGB flushes, wait for
them here.
+ * The cpumask above contains only CPUs that were running
tasks
+ * not using broadcast TLB flushing.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch-
used_invlpgb) {
+ tlbsync();
+ migrate_enable();
Maybe someone mentioned it before, but I would emphasize that I do
not
think that preventing migration for potentially long time is that
great.
One alternative solution would be to set a bit on cpu_tlbstate, that
when set, you'd issue a tlbsync on context switch.
(I can think about other solutions, but I think the one I just
mentioned
is the cleanest one).
It is clean, but I'm not convinced it is good enough.
We need to guarantee that the INVLPGBs have finished
before we free the pages.
Running a TLBSYNC at the next context switch could
mean that TLBSYNC won't run until after the pages
have been freed.
In practice it is probably good enough, since it
would be simpler for TLBSYNC to return once all
pending (older) INVLPGBs have finished, but it's
not architecturally guaranteed.
We could send an IPI to remote CPUs in order for
them to call TLBSYNC, but is that really better?
I am not sure we are on the same page. What I suggested is:
1. arch_tlbbatch_flush() would still do tlbsync()
2. No migrate_enable() in arch_tlbbatch_flush()
3. No migrate_disable() in arch_tlbbatch_add_pending()
4. arch_tlbbatch_add_pending() sets cpu_tlbstate.pending_tlb_broadcast
5. switch_mm_irqs_off() checks cpu_tlbstate.pending_tlb_broadcast and if
it is set performs tlbsync and clears it.