On 2018-09-20 3:25 PM, Mikulas Patocka wrote:
On Thu, 20 Sep 2018, John David Anglin wrote:
On 2018-09-20 1:40 PM, Mikulas Patocka wrote:
But I think this isn't sufficient - see my other email. There's this race
possible if we drop the lock from the fault handler:
CPU1 (executing a page fault handler):
load *pte from pagetable
CPU2 (executing ptep_set_wrprotect)
spin_lock_irqsave(&pa_tlb_lock, flags);
set_pte(ptep, pte_wrprotect(*ptep));
purge_tlb_entries(mm, addr);
spin_unlock_irqrestore(&pa_tlb_lock, flags);
CPU1 (continuing the page fault handler)
insert pte into TLB - now CPU1 is running with stale pte entry in TLB
See my comment about locking and TLB inserts. I think we have fairly
deterministic timing in TLB handler
Cache misses take hundreds of cycles. You'd have only deterministic
behavior if you locked the TLB handler in the cache.
Maybe the handlers need to be cache line aligned. Probably would help
but maybe with
aliasing a line could still get flushed. I don't think there's a way to
lock lines in the cache.
but the lock doesn't ensure that the release won't occur occur before
the insert. Here, I'm assuming we still lock in TLB handler.
In the current state (with the lock), this race condition can't happen. It
would be:
CPU1 (executing a page fault handler):
lock pa_tlb_lock
load *pte from pagetable
insert pte into TLB
unlock pa_tlb_lock
The above two operations might be interchanged when viewed from CPU2.
Is there time
for CPU2 to get lock and purge TLB entry before insert?
The unlock operation will execute after the last access but not be
retired until after insert.
CPU2 (executing ptep_set_wrprotect)
spin_lock_irqsave(&pa_tlb_lock, flags);
set_pte(ptep, pte_wrprotect(*ptep));
purge_tlb_entries(mm, addr);
spin_unlock_irqrestore(&pa_tlb_lock, flags);
--- the purge_tlb_entries function would purge the translation that was
loaded by CPU1, so there will be no stale entry in the TLB. The lock is
needed even if the TLB handler doesn't modify the pagetable.
I understand the issue. I'm just not sure the lock solves the issue on
PA 2.0. Maybe that's why
local purge instructions were added in PA 2.0. If that's the case, then
the purge_tlb_entries
line needs to be after the unlock. You are convincing me that using IPI
and local purges may
be the only correct solution.
I'll run some tests with the locks in the TLB handler and routines in
pgtable.h updated.
--
John David Anglin dave.anglin@xxxxxxxx