Re: [PATCH 6/9] KVM: arm64: Split huge pages when dirty logging is enabled

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 31 Jan 2023 10:31:24 +0000

On Mon, 30 Jan 2023 21:18:32 +0000,
Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> 
> On Fri, Jan 27, 2023 at 07:45:15AM -0800, Ricardo Koller wrote:
> > Hi Marc,
> > 
> > On Thu, Jan 26, 2023 at 12:10 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> 
> [...]
> 
> > >
> > > The one thing that would convince me to make it an option is the
> > > amount of memory this thing consumes. 512+ pages is a huge amount, and
> > > I'm not overly happy about that. Why can't this be a userspace visible
> > > option, selectable on a per VM (or memslot) basis?
> > >
> > 
> > It should be possible.  I am exploring a couple of ideas that could
> > help when the hugepages are not 1G (e.g., 2M).  However, they add
> > complexity and I'm not sure they help much.
> > 
> > (will be using PAGE_SIZE=4K to make things simpler)
> > 
> > This feature pre-allocates 513 pages before splitting every 1G range.
> > For example, it converts 1G block PTEs into trees made of 513 pages.
> > When not using this feature, the same 513 pages would be allocated,
> > but lazily over a longer period of time.
> > 
> > Eager-splitting pre-allocates those pages in order to split huge-pages
> > into fully populated trees.  Which is needed in order to use FEAT_BBM
> > and skipping the expensive TLBI broadcasts.  513 is just the number of
> > pages needed to break a 1G huge-page.
> > 
> > We could optimize for smaller huge-pages, like 2M by splitting 1
> > huge-page at a time: only preallocate one 4K page at a time.  The
> > trick is how to know that we are splitting 2M huge-pages.  We could
> > either get the vma pagesize or use hints from userspace.  I'm not sure
> > that this is worth it though.  The user will most likely want to split
> > big ranges of memory (>1G), so optimizing for smaller huge-pages only
> > converts the left into the right:
> > 
> > alloc 1 page            |    |  alloc 512 pages
> > split 2M huge-page      |    |  split 2M huge-page
> > alloc 1 page            |    |  split 2M huge-page
> > split 2M huge-page      | => |  split 2M huge-page
> >                         ...
> > alloc 1 page            |    |  split 2M huge-page
> > split 2M huge-page      |    |  split 2M huge-page
> > 
> > Still thinking of what else to do.
> 
> I think that Marc's suggestion of having userspace configure this is
> sound. After all, userspace _should_ know the granularity of the backing
> source it chose for guest memory.

Only if it is not using anonymous memory. That's the important distinction.

> 
> We could also interpret a cache size of 0 to signal that userspace wants
> to disable eager page split for a VM altogether. It is entirely possible
> that the user will want a differing QoS between slice-of-hardware and
> overcommitted VMs.

Absolutely. The overcommited case would suffer from the upfront
allocation (these systems are usually very densely packed).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.