On Thu, 20 Sep 2018, John David Anglin wrote: > On 2018-09-20 1:40 PM, Mikulas Patocka wrote: > > > > But I think this isn't sufficient - see my other email. There's this race > > possible if we drop the lock from the fault handler: > > > > CPU1 (executing a page fault handler): > > load *pte from pagetable > > > > CPU2 (executing ptep_set_wrprotect) > > spin_lock_irqsave(&pa_tlb_lock, flags); > > set_pte(ptep, pte_wrprotect(*ptep)); > > purge_tlb_entries(mm, addr); > > spin_unlock_irqrestore(&pa_tlb_lock, flags); > > > > CPU1 (continuing the page fault handler) > > insert pte into TLB - now CPU1 is running with stale pte entry in TLB > > See my comment about locking and TLB inserts. I think we have fairly > deterministic timing in TLB handler Cache misses take hundreds of cycles. You'd have only deterministic behavior if you locked the TLB handler in the cache. > but the lock doesn't ensure that the release won't occur occur before > the insert. Here, I'm assuming we still lock in TLB handler. In the current state (with the lock), this race condition can't happen. It would be: CPU1 (executing a page fault handler): lock pa_tlb_lock load *pte from pagetable insert pte into TLB unlock pa_tlb_lock CPU2 (executing ptep_set_wrprotect) spin_lock_irqsave(&pa_tlb_lock, flags); set_pte(ptep, pte_wrprotect(*ptep)); purge_tlb_entries(mm, addr); spin_unlock_irqrestore(&pa_tlb_lock, flags); --- the purge_tlb_entries function would purge the translation that was loaded by CPU1, so there will be no stale entry in the TLB. The lock is needed even if the TLB handler doesn't modify the pagetable. Mikulas