On 05/06/2014 11:26 AM, Benjamin Herrenschmidt wrote:
On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:
So if I understand this patch correctly, it simply introduces logic to
handle page sizes other than 4k, 64k, 16M by analyzing the actual page
size field in the HPTE. Mind to explain why exactly that enables us to
use THP?
What exactly is the flow if the pages are not backed by huge pages? What
is the flow when they start to get backed by huge pages?
The hypervisor doesn't care about segments ... but it needs to properly
decode the page size requested by the guest, if anything, to issue the
right form of tlbie instruction.
The encoding in the HPTE for a 16M page inside a 64K segment is
different than the encoding for a 16M in a 16M segment, this is done so
that the encoding carries both information, which allows broadcast
tlbie to properly find the right set in the TLB for invalidations among
others.
So from a KVM perspective, we don't know whether the guest is doing THP
or something else (Linux calls it THP but all we care here is that this
is MPSS, another guest than Linux might exploit that differently).
Ugh. So we're just talking about a guest using MPSS here? Not about the
host doing THP? I must've missed that part.
What we do know is that if we advertise MPSS, we need to decode the page
sizes encoded in the HPTE so that we know what we are dealing with in
H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
evictions.
Yes. That makes a lot of sense. So this patch really is all about
enabling MPSS support for 16MB pages. No more, no less.
+ if (a_size != -1)
+ return 1ul << mmu_psize_defs[a_size].shift;
+ }
+
+ }
+ return 0;
}
static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba5af0f..a38d3289320a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
* support pte_enc here
*/
(*sps)->enc[0].pte_enc = def->penc[linux_psize];
+ /*
+ * Add 16MB MPSS support
+ */
+ if (linux_psize != MMU_PAGE_16M) {
+ (*sps)->enc[1].page_shift = 24;
+ (*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
+ }
So this basically indicates that every segment (except for the 16MB one)
can also handle 16MB MPSS page sizes? I suppose you want to remove the
comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.
I haven't reviewed the code there, make sure it will indeed do a
different encoding for every combination of segment/actual page size.
Can we also ensure that every system we run on can do MPSS?
P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
let's make sure we get that right.
yes. When / if people can easily get their hands on p7/p8 bare metal
systems I'll be more than happy to remove 970 support as well, but for
now it's probably good to keep in.
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html