On 20/02/25 09:38, Dave Hansen wrote: > On 2/20/25 09:10, Valentin Schneider wrote: >>> The LDT and maybe the PEBS buffers are the only implicit supervisor >>> accesses to vmalloc()'d memory that I can think of. But those are both >>> handled specially and shouldn't ever get zapped while in use. The LDT >>> replacement has its own IPIs separate from TLB flushing. >>> >>> But I'm actually not all that worried about accesses while actually >>> running userspace. It's that "danger zone" in the kernel between entry >>> and when the TLB might have dangerous garbage in it. >>> >> So say we have kPTI, thus no vmalloc() mapped in CR3 when running >> userspace, and do a full TLB flush right before switching to userspace - >> could the TLB still end up with vmalloc()-range-related entries when we're >> back in the kernel and going through the danger zone? > > Yes, because the danger zone includes the switch back to the kernel CR3 > with vmalloc() fully mapped. All bets are off about what's in the TLB > the moment that CR3 write occurs. > > Actually, you could probably use that. > > If a mapping is in the PTI user page table, you can't defer the flushes > for it. Basically the same rule for text poking in the danger zone. > > If there's a deferred flush pending, make sure that all of the > SWITCH_TO_KERNEL_CR3's fully flush the TLB. You'd need something similar > to user_pcid_flush_mask. > Right, that's what I (roughly) had in mind... > But, honestly, I'm still not sure this is worth all the trouble. If > folks want to avoid IPIs for TLB flushes, there are hardware features > that *DO* that. Just get new hardware instead of adding this complicated > pile of software that we have to maintain forever. In 10 years, we'll > still have this software *and* 95% of our hardware has the hardware > feature too. ... But yeah, it pretty much circumvents arch_context_tracking_work, or at the very least adds an early(er) flushing of the context tracking work... Urgh. Thank you for grounding my wild ideas into reality. I'll try to think some more see if I see any other way out (other than "buy hardware that does what you want and ditch the one that doesn't").