Re: RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm

Igor Mammedov <imammedo@xxxxxxxxxx> · Fri, 30 Oct 2015 14:07:20 +0100



On Thu, 29 Oct 2015 16:16:57 -0200
Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote:

> (CCing Michal and libvir-list, so libvirt team is aware of this
> restriction)
> 
> On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
> > On Tue, 27 Oct 2015 14:36:35 -0200
> > Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote:
> > 
> > > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
> > > > On Tue, 27 Oct 2015 10:53:08 +0200
> > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > 
> > > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
> > > > > > On Tue, 27 Oct 2015 10:31:21 +0200
> > > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > > > 
> > > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov wrote:
> > > > > > > > Yep it's workaround but it works around QEMU's broken virtio
> > > > > > > > implementation in a simple way without need for guest side changes.
> > > > > > > > 
> > > > > > > > Without foreseeable virtio fix it makes memory hotplug unusable and even
> > > > > > > > more so if there were a virtio fix it won't fix old guests since you've
> > > > > > > > said that virtio fix would require changes of both QEMU and guest sides.
> > > > > > > 
> > > > > > > What makes it not foreseeable?
> > > > > > > Apparently only the fact that we have a work-around in place so no one
> > > > > > > works on it.  I can code it up pretty quickly, but I'm flat out of time
> > > > > > > for testing as I'm going on vacation soon, and hard freeze is pretty
> > > > > > > close.
> > > > > > I can lend a hand for testing part.
> > > > > > 
> > > > > > > 
> > > > > > > GPA space is kind of cheap, but wasting it in chunks of 512M
> > > > > > > seems way too aggressive.
> > > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so we aren't
> > > > > > actually wasting anything here.
> > > > > >
> > > > > 
> > > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
> > > > > It's too much either way.
> > > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
> > > > backend's natural alignment (i.e. 1Gb).
> > > 
> > > Is backend configuration even allowed to affect the machine ABI? We need
> > > to be able to change backend configuration when migrating the VM to
> > > another host.
> > for now, one has to use the same type of backend on both sides
> > i.e. if source uses 1Gb huge pages backend then target also
> > need to use it.
> > 
> 
> The page size of the backend don't even depend on QEMU arguments, but on
> the kernel command-line or hugetlbfs mount options. So it's possible to
> have exactly the same QEMU command-line on source and destination (with
> an explicit versioned machine-type), and get a VM that can't be
> migrated? That means we are breaking our guarantees about migration and
> guest ABI.
> 
> 
> > We could change this for the next machine type to always force
> > max alignment (1Gb), then it would be possible to change
> > between backends with different alignments.
> 
> I'm not sure what's the best solution here. If always using 1GB is too
> aggressive, we could require management to ask for an explicit alignment
> as a -machine option if they know they will need a specific backend page
> size.
> 
> BTW, are you talking about the behavior introduced by
> aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
> DIMM's GPA") only, or the backend page size was already affecting GPA
> allocation before that commit?
backend alignment was there since beginning,
we always over-reserve 1GB per slot since we don't know in advance what
alignment hotplugged backend would require.


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list