On Fri, Jul 21, 2017 at 03:44:13PM +1000, Paul Mackerras wrote: > Commit f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB > ioctl() to change HPT size", 2016-12-20) changed the behaviour of > the KVM_PPC_ALLOCATE_HTAB ioctl so that it now allocates a new HPT > and new revmap array if there was a previously-allocated HPT of a > different size from the size being requested. In this case, we need > to reset the rmap arrays of the memslots, because the rmap arrays > will contain references to HPTEs which are no longer valid. Worse, > these references are also references to slots in the new revmap > array (which parallels the HPT), and the new revmap array contains > random contents, since it doesn't get zeroed on allocation. > > The effect of having these stale references to slots in the revmap > array that contain random contents is that subsequent calls to > functions such as kvmppc_add_revmap_chain will crash because they > will interpret the non-zero contents of the revmap array as HPTE > indexes and thus index outside of the revmap array. This leads to > host crashes such as the following. > > [ 7072.862122] Unable to handle kernel paging request for data at address 0xd000000c250c00f8 > [ 7072.862218] Faulting instruction address: 0xc0000000000e1c78 > [ 7072.862233] Oops: Kernel access of bad area, sig: 11 [#1] > [ 7072.862286] SMP NR_CPUS=1024 > [ 7072.862286] NUMA > [ 7072.862325] PowerNV > [ 7072.862378] Modules linked in: kvm_hv vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 mlx5_ib ib_core ses enclosure scsi_transport_sas ipmi_powernv ipmi_devintf ipmi_msghandler powernv_op_panel i2c_opal nfsd auth_rpcgss oid_registry > [ 7072.863085] nfs_acl lockd grace sunrpc kvm_pr kvm xfs libcrc32c scsi_dh_alua dm_service_time radeon lpfc nvme_fc nvme_fabrics nvme_core scsi_transport_fc i2c_algo_bit tg3 drm_kms_helper ptp pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm dm_multipath i2c_core cxgb3 mlx5_core mdio [last unloaded: kvm_hv] > [ 7072.863381] CPU: 72 PID: 56929 Comm: qemu-system-ppc Not tainted 4.12.0-kvm+ #59 > [ 7072.863457] task: c000000fe29e7600 task.stack: c000001e3ffec000 > [ 7072.863520] NIP: c0000000000e1c78 LR: c0000000000e2e3c CTR: c0000000000e25f0 > [ 7072.863596] REGS: c000001e3ffef560 TRAP: 0300 Not tainted (4.12.0-kvm+) > [ 7072.863658] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]> > [ 7072.863667] CR: 44082882 XER: 20000000 > [ 7072.863767] CFAR: c0000000000e2e38 DAR: d000000c250c00f8 DSISR: 42000000 SOFTE: 1 > GPR00: c0000000000e2e3c c000001e3ffef7e0 c000000001407d00 d000000c250c00f0 > GPR04: d00000006509fb70 d00000000b3d2048 0000000003ffdfb7 0000000000000000 > GPR08: 00000001007fdfb7 00000000c000000f d0000000250c0000 000000000070f7bf > GPR12: 0000000000000008 c00000000fdad000 0000000010879478 00000000105a0d78 > GPR16: 00007ffaf4080000 0000000000001190 0000000000000000 0000000000010000 > GPR20: 4001ffffff000415 d00000006509fb70 0000000004091190 0000000ee1881190 > GPR24: 0000000003ffdfb7 0000000003ffdfb7 00000000007fdfb7 c000000f5c958000 > GPR28: d00000002d09fb70 0000000003ffdfb7 d00000006509fb70 d00000000b3d2048 > [ 7072.864439] NIP [c0000000000e1c78] kvmppc_add_revmap_chain+0x88/0x130 > [ 7072.864503] LR [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0 > [ 7072.864566] Call Trace: > [ 7072.864594] [c000001e3ffef7e0] [c000001e3ffef830] 0xc000001e3ffef830 (unreliable) > [ 7072.864671] [c000001e3ffef830] [c0000000000e2e3c] kvmppc_do_h_enter+0x84c/0x9e0 > [ 7072.864751] [c000001e3ffef920] [d00000000b38d878] kvmppc_map_vrma+0x168/0x200 [kvm_hv] > [ 7072.864831] [c000001e3ffef9e0] [d00000000b38a684] kvmppc_vcpu_run_hv+0x1284/0x1300 [kvm_hv] > [ 7072.864914] [c000001e3ffefb30] [d00000000f465664] kvmppc_vcpu_run+0x44/0x60 [kvm] > [ 7072.865008] [c000001e3ffefb60] [d00000000f461864] kvm_arch_vcpu_ioctl_run+0x114/0x290 [kvm] > [ 7072.865152] [c000001e3ffefbe0] [d00000000f453c98] kvm_vcpu_ioctl+0x598/0x7a0 [kvm] > [ 7072.865292] [c000001e3ffefd40] [c000000000389328] do_vfs_ioctl+0xd8/0x8c0 > [ 7072.865410] [c000001e3ffefde0] [c000000000389be4] SyS_ioctl+0xd4/0x130 > [ 7072.865526] [c000001e3ffefe30] [c00000000000b760] system_call+0x58/0x6c > [ 7072.865644] Instruction dump: > [ 7072.865715] e95b2110 793a0020 7b4926e4 7f8a4a14 409e0098 807c000c 786326e4 7c6a1a14 > [ 7072.865857] 935e0008 7bbd0020 813c000c 913e000c <93a30008> 93bc000c 48000038 60000000 > [ 7072.866001] ---[ end trace 627b6e4bf8080edc ]--- > > Note that to trigger this, it is necessary to use a recent upstream > QEMU (or other userspace that resizes the HPT at CAS time), specify > a maximum memory size substantially larger than the current memory > size, and boot a guest kernel that does not support HPT resizing. > > This fixes the problem by resetting the rmap arrays when the old HPT > is freed. > > Fixes: f98a8bf9ee20 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size") > Cc: stable@xxxxxxxxxxxxxxx # v4.11+ > Signed-off-by: Paul Mackerras <paulus@xxxxxxxxxx> Reviewed-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > --- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 710e491..1c10e26 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -164,8 +164,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order) > goto out; > } > > - if (kvm->arch.hpt.virt) > + if (kvm->arch.hpt.virt) { > kvmppc_free_hpt(&kvm->arch.hpt); > + kvmppc_rmap_reset(kvm); > + } > > err = kvmppc_allocate_hpt(&info, order); > if (err < 0) -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature