On Thu, Jun 17, 2021 at 05:01:53PM +0200, Peter Zijlstra wrote: > On Thu, Jun 17, 2021 at 07:00:26AM -0700, Andy Lutomirski wrote: > > On Thu, Jun 17, 2021, at 6:51 AM, Mark Rutland wrote: > > > > It's not clear to me what "the right thing" would mean specifically, and > > > on architectures with userspace cache maintenance JITs can usually do > > > the most optimal maintenance, and only need help for the context > > > synchronization. > > > > > > > This I simply don't believe -- I doubt that any sane architecture > > really works like this. I wrote an email about it to Intel that > > apparently generated internal discussion but no results. Consider: > > > > mmap(some shared library, some previously unmapped address); > > > > this does no heavyweight synchronization, at least on x86. There is > > no "serializing" instruction in the fast path, and it *works* despite > > anything the SDM may or may not say. > > I'm confused; why do you think that is relevant? > > The only way to get into a memory address space is CR3 write, which is > serializing and will flush everything. Since there wasn't anything > mapped, nothing could be 'cached' from that location. > > So that has to work... Ooh, you mean mmap where there was something mmap'ed before. Not virgin space so to say. But in that case, the unmap() would've caused a TLB invalidate, which on x86 is IPIs, which is IRET. Other architectures include I/D cache flushes in their TLB invalidations -- but as elsewhere in the thread, that might not be suffient on its own. But yes, I think TLBI has to imply flushing micro-arch instruction related buffers for any of that to work.