On Wed, Oct 03, 2018 at 03:39:13PM +1000, David Gibson wrote: > On Tue, Oct 02, 2018 at 09:31:21PM +1000, Paul Mackerras wrote: > > From: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx> > > > > Consider a normal (L1) guest running under the main hypervisor (L0), > > and then a nested guest (L2) running under the L1 guest which is acting > > as a nested hypervisor. L0 has page tables to map the address space for > > L1 providing the translation from L1 real address -> L0 real address; > > > > L1 > > | > > | (L1 -> L0) > > | > > ----> L0 > > > > There are also page tables in L1 used to map the address space for L2 > > providing the translation from L2 real address -> L1 read address. Since > > the hardware can only walk a single level of page table, we need to > > maintain in L0 a "shadow_pgtable" for L2 which provides the translation > > from L2 real address -> L0 real address. Which looks like; > > > > L2 L2 > > | | > > | (L2 -> L1) | > > | | > > ----> L1 | (L2 -> L0) > > | | > > | (L1 -> L0) | > > | | > > ----> L0 --------> L0 > > > > When a page fault occurs while running a nested (L2) guest we need to > > insert a pte into this "shadow_pgtable" for the L2 -> L0 mapping. To > > do this we need to: > > > > 1. Walk the pgtable in L1 memory to find the L2 -> L1 mapping, and > > provide a page fault to L1 if this mapping doesn't exist. > > 2. Use our L1 -> L0 pgtable to convert this L1 address to an L0 address, > > or try to insert a pte for that mapping if it doesn't exist. > > 3. Now we have a L2 -> L0 mapping, insert this into our shadow_pgtable > > > > Once this mapping exists we can take rc faults when hardware is unable > > to automatically set the reference and change bits in the pte. On these > > we need to: > > > > 1. Check the rc bits on the L2 -> L1 pte match, and otherwise reflect > > the fault down to L1. > > 2. Set the rc bits in the L1 -> L0 pte which corresponds to the same > > host page. > > 3. Set the rc bits in the L2 -> L0 pte. > > > > As we reuse a large number of functions in book3s_64_mmu_radix.c for > > this we also needed to refactor a number of these functions to take > > an lpid parameter so that the correct lpid is used for tlb invalidations. > > The functionality however has remained the same. > > > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx> > > Signed-off-by: Paul Mackerras <paulus@xxxxxxxxxx> > > Some comments below, but no showstoppers, so, > > Reviewed-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> One more, again not a showstopper: > > @@ -393,10 +396,20 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) > > */ > > static void kvmhv_release_nested(struct kvm_nested_guest *gp) > > { > > + struct kvm *kvm = gp->l1_host; > > + > > + if (gp->shadow_pgtable) { > > + /* > > + * No vcpu is using this struct and no call to > > + * kvmhv_remove_nest_rmap can find this struct, It's kind of dubious that you're referring to kvmhv_remove_nest_rmap() a patch before it is introduced. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature