On Tue, Jun 20, 2023 at 12:33 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > On Sat May 27, 2023 at 9:44 AM AEST, Yu Zhao wrote: > > KVM page tables are currently not RCU safe against remapping, i.e., > > kvmppc_unmap_free_pmd_entry_table() et al. The previous > > Minor nit but the "page table" is not RCU-safe against something. It > is RCU-freed, and therefore some algorithm that accesses it can have > the existence guarantee provided by RCU (usually there still needs > to be more to it). > > > mmu_notifier_ops members rely on kvm->mmu_lock to synchronize with > > that operation. > > > > However, the new mmu_notifier_ops member test_clear_young() provides > > a fast path that does not take kvm->mmu_lock. To implement > > kvm_arch_test_clear_young() for that path, orphan page tables need to > > be freed by RCU. > > Short version: clear the referenced bit using RCU instead of MMU lock > to protect against page table freeing, and there is no problem with > clearing the bit in a table that has been freed. > > Seems reasonable. Thanks. All above points taken. > > Unmapping, specifically kvm_unmap_radix(), does not free page tables, > > hence not a concern. > > Not sure if you really need to make the distinction about why the page > table is freed, we might free them via unmapping. The point is just > anything that frees them while there can be concurrent access, right? Correct. > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > --- > > arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > index 461307b89c3a..3b65b3b11041 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > @@ -1469,13 +1469,15 @@ int kvmppc_radix_init(void) > > { > > unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; > > > > - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); > > + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, > > + SLAB_TYPESAFE_BY_RCU, pte_ctor); > > if (!kvm_pte_cache) > > return -ENOMEM; > > > > size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; > > > > - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); > > + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, > > + SLAB_TYPESAFE_BY_RCU, pmd_ctor); > > if (!kvm_pmd_cache) { > > kmem_cache_destroy(kvm_pte_cache); > > return -ENOMEM; > > KVM PPC HV radix PUD level page tables use the arch/powerpc allocators > (for some reason), which are not RCU freed. I think you need them too? We don't. The use of the arch/powerpc allocator for PUD tables seems appropriate to me because, unlike PMD/PTE tables, we never free PUD tables during the lifetime of a VM: * We don't free PUD/PMD/PTE tables when they become empty, i.e., not mapping any pages but still attached. (We could in theory, as x86/aarch64 do.) * We have to free PMD/PTE tables when we replace them with 1GB/2MB pages. (Otherwise we'd lose track of detached tables.) And we currently don't support huge pages at P4D level, so we never detach and free PUD tables.