Re: HV KVM fails on 970 due to HTAB allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.01.2012, at 06:11, Paul Mackerras wrote:

> On Thu, Jan 12, 2012 at 07:16:51PM +0100, Alexander Graf wrote:
> 
>> While trying to run HV KVM for something useful on 970, we stumbled
>> over the following code path:
>> 
>>        /* Allocate guest's hashed page table */
>>        hpt =
>> __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
>>                               HPT_ORDER - PAGE_SHIFT);
>>        if (!hpt) {
>>                pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
>>                return -ENOMEM;
>>        }
>>        kvm->arch.hpt_virt = hpt;
>> 
>> We're most of the time running into the !hpt case, because we simply
>> don't have 16MB of contiguous memory lying around.
>> 
>> I was trying to check if we could maybe allocate a huge_tlb page
>> from within kernel space, since that usually matches the 16MB pretty
>> well. However that seems to be very tricky. Maybe something similar
>> to the RMA thing would be a good idea?
> 
> In discussing this with David Gibson in the past, one idea we have had
> is to have userspace allocate the HPT using hugetlbfs and supply it to
> KVM via an ioctl.  If userspace doesn't call that ioctl then we try to
> do a high-order allocation, as at present, when they do the first
> VCPU_RUN ioctl.

At which point user space has complete control over the HPT which converts guest EA to host RA addresses? I don't think that's a good idea :).

> The other thing the code could do is to fall back to lower-order
> allocations.  The HPT doesn't have to be 16MB in size; any power of 2
> that is at least 256kB will do (there is an upper limit, but it is
> enormous).  Smaller sizes will potentially reduce performance, of
> course (and the size of the VRMA on POWER7, but on 970 we have to use
> an RMO region, which isn't affected by the HPT size).

Which means that a guest could potentially run slower due to random circumstances on the host. Or in other words, benchmarking after bootup will be fast, benchmarking after 2 weeks of runtime of the system will be slow. This really should be the last resort.

Maybe we should do something similar to the RMA allocator, where we on bootup define how many VMs we want to preallocate memory for? I really don't like that either, but can't think of a better approach atm.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux