On Wed, 18 Nov 2015 17:26:54 +0100 Peter Krempa <pkrempa@xxxxxxxxxx> wrote: > On Wed, Nov 18, 2015 at 15:13:20 +0100, Andrea Bolognani wrote: > > The amount of memory a ppc64 domain might need to lock is different > > than that of a equally-sized x86 domain, so we need to check the > > domain's architecture and act accordingly. > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1273480 > > --- > > src/qemu/qemu_domain.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 79 insertions(+), 1 deletion(-) > > > > ACK, although I'd like to hear David's opinion (cc'd). So, as Andrea said, the text in the comments is mostly mine, so this pretty much matches what I suggested. I still haven't had a chance to investigate the original failing case more deeply to see exactly what was going on, so I am concerned I might have missed something. But, the code presented here is certainly closer to correct than the previous code. Even if I/we have missed some things the version Andrea suggests should have the right overall structure, so it will be simpler to tweak than the old code. I'll make a couple of extra points to help explain why Power has these extra sources of locked memory, even without VFIO [1]. First, on x86 the guest's page tables exist within the guest's regular memory space. On Power the PAPR paravirtualized environment has the page table ("hash page table"[2]) outside the guest's memory space, accessed an entry at a time via hypercalls. The hash page table cannot be swapped or paged itself, so should be accounted as locked memory (although it actually isn't right now). Second, under PAPR, the guest always sees an IOMMU, and it's always turned on (PAPR just doesn't have the concept of "no IOMMU"). On x86 although the host uses an IOMMU to implement VFIO, it's not usually visible to the guest. Even when there is a guest visible IOMMU on x86, its page tables again exist within the guest memory space. With PAPR the IOMMU page tables ("TCE tables") again exist outside the guest memory space. Those TCE tables can either end up in normal qemu memory, or in kernel memory depending on what combination of VFIO and our KVM IOMMU acceleration for emulated devices. But under at least some combinations it's again unswappable memory, and so we should account it as locked. [1] Or at least, it might in future, Andrea is accounting for several things that don't actually impact locked_vm now, but probably should. [2] Complete aside. The Power MMU works very differently from the x86 MMU (or indeed the MMU on any other arch I know of), using a hash table to locate PTEs, rather than a radix tree (PGDs -> PUDs -> PMDs -> PTEs). IBM Research were/are terribly proud of the design which apparently had significant advantages for big database loads with a widely scattered working set - advantages which have been completely swamped by horrible cache behaviour for most of the last 15 years. It also requires a big slab of physically contiguous memory for the hash table, which is a bit of a pain for us. Linux actually treats the hash page table as though it were an enormous TLB, reloading it as necessary from radix style page tables. -- David Gibson <dgibson@xxxxxxxxxx> Senior Software Engineer, Virtualization, Red Hat
Attachment:
pgpiEWS4Rhdyq.pgp
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list