On 19/02/25 12:25, Dave Hansen wrote: > On 2/19/25 07:13, Valentin Schneider wrote: >>> Maybe I missed part of the discussion though. Is VMEMMAP your only >>> concern? I would have guessed that the more generic vmalloc() >>> functionality would be harder to pin down. >> Urgh, that'll teach me to send emails that late - I did indeed mean the >> vmalloc() range, not at all VMEMMAP. IIUC *neither* are present in the user >> kPTI page table and AFAICT the page table swap is done before the actual vmap'd >> stack (CONFIG_VMAP_STACK=y) gets used. > > OK, so rewriting your question... ;) > >> So what if the vmalloc() range *isn't* in the CR3 tree when a CPU is >> executing in userspace? > > The LDT and maybe the PEBS buffers are the only implicit supervisor > accesses to vmalloc()'d memory that I can think of. But those are both > handled specially and shouldn't ever get zapped while in use. The LDT > replacement has its own IPIs separate from TLB flushing. > > But I'm actually not all that worried about accesses while actually > running userspace. It's that "danger zone" in the kernel between entry > and when the TLB might have dangerous garbage in it. > So say we have kPTI, thus no vmalloc() mapped in CR3 when running userspace, and do a full TLB flush right before switching to userspace - could the TLB still end up with vmalloc()-range-related entries when we're back in the kernel and going through the danger zone? > BTW, I hope this whole thing is turned off on 32-bit. There, we can > actually take and handle faults on the vmalloc() area. If you get one of > those faults in your "danger zone", it'll start running page fault code > which will branch out to god-knows-where and certainly isn't noinstr. Sounds... Fun. Thanks for pointing out the landmines.