On Mon Jan 23, 2023 at 6:16 PM AEST, Nadav Amit wrote: > > > On 1/19/23 6:22 AM, Nicholas Piggin wrote: > > On Thu Jan 19, 2023 at 8:22 AM AEST, Nadav Amit wrote: > >> > >> > >>> On Jan 18, 2023, at 12:00 AM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > >>> > >>> +static void do_shoot_lazy_tlb(void *arg) > >>> +{ > >>> + struct mm_struct *mm = arg; > >>> + > >>> + if (current->active_mm == mm) { > >>> + WARN_ON_ONCE(current->mm); > >>> + current->active_mm = &init_mm; > >>> + switch_mm(mm, &init_mm, current); > >>> + } > >>> +} > >> > >> I might be out of touch - doesn’t a flush already take place when we free > >> the page-tables, at least on common cases on x86? > >> > >> IIUC exit_mmap() would free page-tables, and whenever page-tables are > >> freed, on x86, we do shootdown regardless to whether the target CPU TLB state > >> marks is_lazy. Then, flush_tlb_func() should call switch_mm_irqs_off() and > >> everything should be fine, no? > >> > >> [ I understand you care about powerpc, just wondering on the effect on x86 ] > > > > Now I come to think of it, Rik had done this for x86 a while back. > > > > https://lore.kernel.org/all/20180728215357.3249-10-riel@xxxxxxxxxxx/ > > > > I didn't know about it when I wrote this, so I never dug into why it > > didn't get merged. It might have missed the final __mmdrop races but > > I'm not not sure, x86 lazy tlb mode is too complicated to know at a > > glance. I would check with him though. > > My point was that naturally (i.e., as done today), when exit_mmap() is > done, you release the page tables (not just the pages). On x86 it means > that you also send shootdown IPI to all the *lazy* CPUs to perform a > flush, so they would exit the lazy mode. > > [ this should be true for 99% of the cases, excluding cases where there > were not page-tables, for instance ] > > So the patch of Rik, I think, does not help in the common cases, > although it may perhaps make implicit actions more explicit in the code. If that's what it does, then sure. IIRC x86 didn't used to work that way long ago, but you would know what it does today. You might find it doesn't need much arch change to work. OTOH Andy has major problems with active_mm and some other x86 use-after-free weirdness that that I wasn't able to comprehend. He'll be naking x86 implementation until that's all cleaned up so better try to understand what's going on with that first. Thanks, Nick