On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote: > On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote: > > Observed the following oops while stress-testing, using multiple > > secureVM on a distro kernel. However this issue theoritically exists in > > 5.5 kernel and later. > > > > This issue occurs when the total number of requested device-PFNs exceed > > the total-number of available device-PFNs. PFN migration fails to > > allocate a device-pfn, which causes migrate_vma_finalize() to trigger > > kvmppc_uvmem_page_free() on a page, that is not associated with any > > device-pfn. kvmppc_uvmem_page_free() blindly tries to access the > > contents of the private data which can be null, leading to the following > > kernel fault. > > > > -------------------------------------------------------------------------- > > Unable to handle kernel paging request for data at address 0x00000011 > > Faulting instruction address: 0xc00800000e36e110 > > Oops: Kernel access of bad area, sig: 11 [#1] > > LE SMP NR_CPUS=2048 NUMA PowerNV > > .... > > MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> > > CR: 24424822 XER: 00000000 > > CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0 > > GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000 > > GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300 > > GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608 > > GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000 > > GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000 > > GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001 > > GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001 > > GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80 > > NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv] > > LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv] > > Call Trace: > > [c000000000512010] free_devmap_managed_page+0xd0/0x100 > > [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0 > > [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410 > > [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv] > > [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv] > > [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv] > > [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv] > > [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv] > > [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm] > > [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm] > > [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm] > > [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0 > > [c000000000545d64] sys_ioctl+0xc4/0x160 > > [c00000000000b408] system_call+0x5c/0x70 > > Instruction dump: > > a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000 > > 913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000 > > -------------------------------------------------------------------------- > > > > Fix the oops.. > > > > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests") > > Signed-off-by: Ram Pai <linuxram@xxxxxxxxxx> > > --- > > arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c > > index 2806983..f4002bf 100644 > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page) > > { > > unsigned long pfn = page_to_pfn(page) - > > (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT); > > - struct kvmppc_uvmem_page_pvt *pvt; > > + struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data; > > + > > + if (!pvt) > > + return; > > > > spin_lock(&kvmppc_uvmem_bitmap_lock); > > bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1); > > spin_unlock(&kvmppc_uvmem_bitmap_lock); > > > > - pvt = page->zone_device_data; > > page->zone_device_data = NULL; > > if (pvt->remove_gfn) > > kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm); > > In our case, device pages that are in use are always associated with a valid > pvt member. See kvmppc_uvmem_get_page() which returns failure if it > runs out of device pfns and that will result in proper failure of > page-in calls. looked at the code, and yes that code path looks correct. So my reasoning behind the root cause of this bug is incorrect. However the bug is surfacing and there must be a reason. > > For the case where we run out of device pfns, migrate_vma_finalize() will > restore the original PTE and will not replace the PTE with device private PTE. > > Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never > called for non-device-private pages. Yes. it should not be called. But as seen above in the stack trace, it is called. What would cause the HMM to call ->page_free() on a page that is not associated with that device's pfn? > > This could be a use-after-free case possibly arising out of the new state > changes in HV. If so, this fix will only mask the bug and not address the > original problem. I can verify by rerunning the tests, without the new state changes. But I do not see how those changes can cause this fault? This could also be caused by a duplicate ->page_free() call due to some bug in the migrate_page path? Could there be a race between migrate_page() and a page_fault ? Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not access contents of pvt, without verifing pvt is valid. > > Regards, > Bharata. -- Ram Pai