On Mon, Nov 29, 2021 at 10:31:14AM -0800, Ben Gardon wrote: > > As comment above handle_removed_tdp_mmu_page() showed, at this point IIUC > > current thread should have exclusive ownership of this orphaned and abandoned > > pgtable page, then why in handle_removed_tdp_mmu_page() we still need all the > > atomic operations and REMOVED_SPTE tricks to protect from concurrent access? > > Since that's cmpxchg-ed out of the old pgtable, what can be accessing it > > besides the current thread? > > The cmpxchg does nothing to guarantee that other threads can't have a > pointer to the page table, only that this thread knows it's the one > that removed it from the page table. Other threads could still have > pointers to it in two ways: > 1. A kernel thread could be in the process of modifying an SPTE in the > page table, under the MMU lock in read mode. In that case, there's no > guarantee that there's not another kernel thread with a pointer to the > SPTE until the end of an RCU grace period. Right, I definitely missed that whole picture of the RCU usage. Thanks. > 2. There could be a pointer to the page table in a vCPU's paging > structure caches, which are similar to the TLB but cache partial > translations. These are also cleared out on TLB flush. Could you elaborate what's the structure cache that you mentioned? I thought the processor page walker will just use the data cache (L1-L3) as pgtable caches, in which case IIUC the invalidation happens when we do WRITE_ONCE() that'll invalidate all the rest data cache besides the writter core. But I could be completely missing something.. > Sean's recent series linked the RCU grace period and TLB flush in a > clever way so that we can ensure that the end of a grace period > implies that the necessary flushes have happened already, but we still > need to clear out the disconnected page table with atomic operations. > We need to clear it out mostly to collect dirty / accessed bits and > update page size stats. Yes, this sounds reasonable too. -- Peter Xu