On 12/30/24 09:53, Rik van Riel wrote: ... > +#ifdef CONFIG_CPU_SUP_AMD > + struct list_head broadcast_asid_list; > + u16 broadcast_asid; > + bool asid_transition; > +#endif Could we either do: config X86_TLB_FLUSH_BROADCAST_HW bool depends on CONFIG_CPU_SUP_AMD or even #define X86_TLB_FLUSH_BROADCAST_HW CONFIG_CPU_SUP_AMD for the whole series please? There are a non-trivial number of #ifdefs here and it would be nice to know what there're for, logically. This is a completely selfish request because Intel has a similar feature and we're surely going to give this approach a try on Intel CPUs too. Second, is there something that prevents you from defining a new MM_CONTEXT_* flag instead of a new bool? It might save bloating the context by a few words. > #ifdef CONFIG_ADDRESS_MASKING > /* Active LAM mode: X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */ > unsigned long lam_cr3_mask; > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h > index 795fdd53bd0a..0dc446c427d2 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -139,6 +139,8 @@ static inline void mm_reset_untag_mask(struct mm_struct *mm) > #define enter_lazy_tlb enter_lazy_tlb > extern void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk); > > +extern void destroy_context_free_broadcast_asid(struct mm_struct *mm); > + > /* > * Init a new mm. Used on mm copies, like at fork() > * and on mm's that are brand-new, like at execve(). > @@ -161,6 +163,13 @@ static inline int init_new_context(struct task_struct *tsk, > mm->context.execute_only_pkey = -1; > } > #endif > + > +#ifdef CONFIG_CPU_SUP_AMD > + INIT_LIST_HEAD(&mm->context.broadcast_asid_list); > + mm->context.broadcast_asid = 0; > + mm->context.asid_transition = false; > +#endif We've been inconsistent about it, but I think I'd prefer that this had a: if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { ... } wrapper as opposed to CONFIG_CPU_SUP_AMD. It might save dirtying a cacheline on all the CPUs that don't care. cpu_feature_enabled() would also function the same as the #ifdef. > mm_reset_untag_mask(mm); > init_new_context_ldt(mm); > return 0; > @@ -170,6 +179,9 @@ static inline int init_new_context(struct task_struct *tsk, > static inline void destroy_context(struct mm_struct *mm) > { > destroy_context_ldt(mm); > +#ifdef CONFIG_CPU_SUP_AMD > + destroy_context_free_broadcast_asid(mm); > +#endif > } > > extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index 20074f17fbcd..5e9956af98d1 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -65,6 +65,23 @@ static inline void cr4_clear_bits(unsigned long mask) > */ > #define TLB_NR_DYN_ASIDS 6 > > +#ifdef CONFIG_CPU_SUP_AMD > +#define is_dyn_asid(asid) (asid) < TLB_NR_DYN_ASIDS > +#define is_broadcast_asid(asid) (asid) >= TLB_NR_DYN_ASIDS > +#define in_asid_transition(info) (info->mm && info->mm->context.asid_transition) > +#define mm_broadcast_asid(mm) (mm->context.broadcast_asid) > +#else > +#define is_dyn_asid(asid) true > +#define is_broadcast_asid(asid) false > +#define in_asid_transition(info) false > +#define mm_broadcast_asid(mm) 0 I think it was said elsewhere, but I also prefer static inlines for these instead of macros. The type checking that you get from the compiler in _both_ compile configurations is much more valuable than brevity. ... > + /* > + * TLB consistency for this ASID is maintained with INVLPGB; > + * TLB flushes happen even while the process isn't running. > + */ I'm not sure this comment helps much. The thing that matters here is that a broadcast ASID is asigned from a global namespace and not from a per-cpu namespace. > +#ifdef CONFIG_CPU_SUP_AMD > + if (static_cpu_has(X86_FEATURE_INVLPGB) && mm_broadcast_asid(next)) { > + *new_asid = mm_broadcast_asid(next); > + *need_flush = false; > + return; > + } > +#endif > + > if (this_cpu_read(cpu_tlbstate.invalidate_other)) > clear_asid_other(); > > @@ -251,6 +265,245 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, > *need_flush = true; > } > > +#ifdef CONFIG_CPU_SUP_AMD > +/* > + * Logic for AMD INVLPGB support. > + */ This comment is another indication that this shouldn't all be crammed under CONFIG_CPU_SUP_AMD. > +static DEFINE_RAW_SPINLOCK(broadcast_asid_lock); > +static u16 last_broadcast_asid = TLB_NR_DYN_ASIDS; > +static DECLARE_BITMAP(broadcast_asid_used, MAX_ASID_AVAILABLE) = { 0 }; I'm debating whether this should be a bitmap for "broadcast" ASIDs alone or for all ASIDs. > +static LIST_HEAD(broadcast_asid_list); > +static int broadcast_asid_available = MAX_ASID_AVAILABLE - TLB_NR_DYN_ASIDS - 1; > + > +static void reset_broadcast_asid_space(void) > +{ > + mm_context_t *context; > + > + lockdep_assert_held(&broadcast_asid_lock); > + > + /* > + * Flush once when we wrap around the ASID space, so we won't need > + * to flush every time we allocate an ASID for boradcast flushing. ^ broadcast > + */ > + invlpgb_flush_all_nonglobals(); > + tlbsync(); > + > + /* > + * Leave the currently used broadcast ASIDs set in the bitmap, since > + * those cannot be reused before the next wraparound and flush.. > + */ > + bitmap_clear(broadcast_asid_used, 0, MAX_ASID_AVAILABLE); > + list_for_each_entry(context, &broadcast_asid_list, broadcast_asid_list) > + __set_bit(context->broadcast_asid, broadcast_asid_used); > + > + last_broadcast_asid = TLB_NR_DYN_ASIDS; > +} 'TLB_NR_DYN_ASIDS' is special here. Could it please be made more clear what it means *logically*? > +static u16 get_broadcast_asid(void) > +{ > + lockdep_assert_held(&broadcast_asid_lock); > + > + do { > + u16 start = last_broadcast_asid; > + u16 asid = find_next_zero_bit(broadcast_asid_used, MAX_ASID_AVAILABLE, start); > + > + if (asid >= MAX_ASID_AVAILABLE) { > + reset_broadcast_asid_space(); > + continue; > + } > + > + /* Try claiming this broadcast ASID. */ > + if (!test_and_set_bit(asid, broadcast_asid_used)) { > + last_broadcast_asid = asid; > + return asid; > + } > + } while (1); > +} I think it was said elsewhere, but the "try" logic doesn't make a lot of sense to me when it's all protected by a global lock. > +/* > + * Returns true if the mm is transitioning from a CPU-local ASID to a broadcast > + * (INVLPGB) ASID, or the other way around. > + */ > +static bool needs_broadcast_asid_reload(struct mm_struct *next, u16 prev_asid) > +{ > + u16 broadcast_asid = mm_broadcast_asid(next); > + > + if (broadcast_asid && prev_asid != broadcast_asid) > + return true; > + > + if (!broadcast_asid && is_broadcast_asid(prev_asid)) > + return true; > + > + return false; > +} > + > +void destroy_context_free_broadcast_asid(struct mm_struct *mm) > +{ > + if (!mm->context.broadcast_asid) > + return; > + > + guard(raw_spinlock_irqsave)(&broadcast_asid_lock); > + mm->context.broadcast_asid = 0; > + list_del(&mm->context.broadcast_asid_list); > + broadcast_asid_available++; > +} > + > +static bool mm_active_cpus_exceeds(struct mm_struct *mm, int threshold) > +{ This function is pretty important. It's kinda missing a comment about its theory of operation. > + int count = 0; > + int cpu; > + > + if (cpumask_weight(mm_cpumask(mm)) <= threshold) > + return false; There's a lot of potential redundancy between this check and the one below. I assume this sequence was desinged for performance: first, do a cheap, one-stop-shopping check on mm_cpumask(). If it looks, ok, then go marauding around in a bunch of per_cpu() cachelines in a much more expensive but precise search. Could we spell some of that out explicitly, please? > + for_each_cpu(cpu, mm_cpumask(mm)) { > + /* Skip the CPUs that aren't really running this process. */ > + if (per_cpu(cpu_tlbstate.loaded_mm, cpu) != mm) > + continue; This is the only place I know of where 'cpu_tlbstate' is read from a non-local CPU. This is fundamentally racy as hell and needs some heavy commenting about why this raciness is OK.