A couple of high level things we need to address: First, I'm OK calling this approach "broadcast TLB invalidation". But I don't think the ASIDs should be called "broadcast ASIDs". I'd much rather that they are called something which makes it clear that they are from a different namespace than the existing ASIDs. After this series there will be three classes: 0: Special ASID used for the kernel, basically 1->TLB_NR_DYN_ASIDS: Allocated from private, per-cpu space. Meaningless when compared between CPUs. >TLB_NR_DYN_ASIDS: Allocated from shared, kernel-wide space. All CPUs share this space and must all agree on what the values mean. The fact that the "shared" ones are system-wide obviously allows INVLPGB to be used. The hardware feature also obviously "broadcasts" things more than plain old INVLPG did. But I don't think that makes the ASIDs "broadcast" ASIDs. It's much more important to know that they are shared across the system instead of per-cpu than the fact that the deep implementation manages them with an instruction that is "broadcast" by hardware. So can we call them "global", "shared" or "system" ASIDs, please? Second, the TLB_NR_DYN_ASIDS was picked because it's roughly the number of distinct PCIDs that the CPU can keep in the TLB at once (at least on Intel). Let's say a CPU has 6 mm's in the per-cpu ASID space and another 6 in the shared/broadcast space. At that point, PCIDs might not be doing much good because the TLB can't store entries for 12 PCIDs. Is there any comprehension in this series? Should we be indexing cpu_tlbstate.ctxs[] by a *context* number rather than by the ASID that it's running as? Last, I'm not 100% convinced we want to do this whole thing. The will-it-scale numbers are nice. But given the complexity of this, I think we need some actual, real end users to stand up and say exactly how this is important in *PRODUCTION* to them.