On Wed, Oct 03, 2018 at 03:39:13PM +1000, David Gibson wrote: > On Tue, Oct 02, 2018 at 09:31:21PM +1000, Paul Mackerras wrote: > > From: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx> > > @@ -367,7 +367,9 @@ struct kvmppc_pte { > > bool may_write : 1; > > bool may_execute : 1; > > unsigned long wimg; > > + unsigned long rc; > > u8 page_size; /* MMU_PAGE_xxx */ > > + u16 page_shift; > > It's a bit ugly that this has both page_size and page_shift, which is > redundant information AFAICT. Also, why does page_shift need to be > u16 - given that 2^255 bytes is much more than our supported address > space, let alone a plausible page size. These values are all essentially function outputs, so I don't think it's ugly to have the same information in different forms. I actually don't like using the MMU_PAGE_xxx values, because the information in the mmu_psize_defs[] array depends on the MMU mode of the host, but KVM needs to be able to work with guests in both MMU modes. More generally I don't think it's a good idea that the KVM <-> guest interface depends so much on what the host firmware tells us about the physical machine we're on. Thus I'm trying to move away from using MMU_PSIZE_xxx values and mmu_psize_defs[] in KVM code. I'll change the type to u8. > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > index bd06a95..ee6f493 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > @@ -29,43 +29,16 @@ > > */ > > static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; > > > > -/* > > - * Used to walk a partition or process table radix tree in guest memory > > - * Note: We exploit the fact that a partition table and a process > > - * table have the same layout, a partition-scoped page table and a > > - * process-scoped page table have the same layout, and the 2nd > > - * doubleword of a partition table entry has the same layout as > > - * the PTCR register. > > - */ > > -int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr, > > - struct kvmppc_pte *gpte, u64 table, > > - int table_index, u64 *pte_ret_p) > > +int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, > > + struct kvmppc_pte *gpte, u64 root, > > + u64 *pte_ret_p) > > { > > struct kvm *kvm = vcpu->kvm; > > int ret, level, ps; > > - unsigned long ptbl, root; > > - unsigned long rts, bits, offset; > > - unsigned long size, index; > > - struct prtb_entry entry; > > + unsigned long rts, bits, offset, index; > > u64 pte, base, gpa; > > __be64 rpte; > > > > - if ((table & PRTS_MASK) > 24) > > - return -EINVAL; > > - size = 1ul << ((table & PRTS_MASK) + 12); > > - > > - /* Is the table big enough to contain this entry? */ > > - if ((table_index * sizeof(entry)) >= size) > > - return -EINVAL; > > - > > - /* Read the table to find the root of the radix tree */ > > - ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry)); > > - ret = kvm_read_guest(kvm, ptbl, &entry, sizeof(entry)); > > - if (ret) > > - return ret; > > - > > - /* Root is stored in the first double word */ > > - root = be64_to_cpu(entry.prtb0); > > This refactoring somewhat obscures the changes directly relevant to > the nested guest handling. Ideally it would be nice to fold some of > this into the earlier reworkings. True, but given the rapidly approaching merge window, I'm not inclined to rework it. > > + if (ret) { > > + /* We didn't find a pte */ > > + if (ret == -EINVAL) { > > + /* Unsupported mmu config */ > > + flags |= DSISR_UNSUPP_MMU; > > + } else if (ret == -ENOENT) { > > + /* No translation found */ > > + flags |= DSISR_NOHPTE; > > + } else if (ret == -EFAULT) { > > + /* Couldn't access L1 real address */ > > + flags |= DSISR_PRTABLE_FAULT; > > + vcpu->arch.fault_gpa = fault_addr; > > + } else { > > + /* Unknown error */ > > + return ret; > > + } > > + goto resume_host; > > This is effectively forwarding the fault to L1, yes? In which case a > different name might be better than the ambiguous "resume_host". I'll change it to "forward_to_l1". Paul.