On Fri, 2025-01-10 at 07:14 -0800, Dave Hansen wrote: > On 1/9/25 22:07, Nadav Amit wrote: > > This is not my reading. I think that this reading assumes that > > besides > > the broadcast, some new “range flush” was added to the TLB. My > > guess > > is that this not the case, since presumably it would require a > > different > > TLB structure (and who does 2 changes at once 😉 ). > > Reading it again, I think you're right. > > The INVLPG and INVLPGB language is too close. It would also _talk_ > about > invalidating a range rather than just incrementing an address to > invalidate. > > I think the key thing we need to decide is whether to treat a single > INVLPGB(stride=8) more like a single INVLPGB or eight INVLPGBs. > Measuring a bunch of invalidation looks should tell us that. Would I be wrong to assume that the CPUs have some optimizations built in to efficiently execute an invalidation for "everything in a PCID"? The "global invalidate" we send does not zap everything in the TLB, but only the translations for a single PCID. I suppose we should measure these things at some point (after I do the other cleanups?), because the CPUs may well have made a bunch of optimizations that we don't know about. -- All Rights Reversed.