Re: [PATCHv4 1/8] mm: Add support for unaccepted memory

"Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> · Wed, 13 Apr 2022 14:30:24 +0300

On Wed, Apr 13, 2022 at 12:36:11PM +0200, David Hildenbrand wrote:
> On 12.04.22 18:08, Dave Hansen wrote:
> > On 4/12/22 01:15, David Hildenbrand wrote:
> >> Can we simply automate this using a kthread or smth like that, which
> >> just traverses the free page lists and accepts pages (similar, but
> >> different to free page reporting)?
> > 
> > That's definitely doable.
> > 
> > The downside is that this will force premature consumption of physical
> > memory resources that the guest may never use.  That's a particular
> > problem on TDX systems since there is no way for a VMM to reclaim guest
> > memory short of killing the guest.
> 
> IIRC, the hypervisor will usually effectively populate all guest RAM
> either way right now.

No, it is not usual. By default QEMU/KVM uses anonymous mapping and
fault-in memory on demand.

Yes, there's an option to pre-populate guest memory, but it is not the
default.

> So yes, for hypervisors that might optimize for
> that, that statement would be true. But I lost track how helpful it
> would be in the near future e.g., with the fd-based private guest memory
> -- maybe they already optimize for delayed acceptance of memory, turning
> it into delayed population.
> 
> > 
> > In other words, I can see a good argument either way:
> > 1. The kernel should accept everything to avoid the perf nastiness
> > 2. The kernel should accept only what it needs in order to reduce memory
> >    use
> > 
> > I'm kinda partial to #1 though, if I had to pick only one.
> > 
> > The other option might be to tie this all to DEFERRED_STRUCT_PAGE_INIT.
> >  Have the rule that everything that gets a 'struct page' must be
> > accepted.  If you want to do delayed acceptance, you do it via
> > DEFERRED_STRUCT_PAGE_INIT.
> 
> That could also be an option, yes. At least being able to chose would be
> good. But IIRC, DEFERRED_STRUCT_PAGE_INIT will still make the system get
> stuck during boot and wait until everything was accepted.

Right. It deferred page init has to be done before init.

> I see the following variants:
> 
> 1) Slow boot; after boot, all memory is already accepted.
> 2) Fast boot; after boot, all memory will slowly but steadily get
>    accepted in the background. After a while, all memory is accepted and
>    can be signaled to user space.
> 3) Fast boot; after boot, memory gets accepted on demand. This is what
>    we have in this series.
> 
> I somehow don't quite like 3), but with deferred population in the
> hypervisor, it might just make sense.

Conceptionally, 3 is not different from what happens now. The first time
normal VM touches the page (like on handling __GFP_ZERO) the page gets
allocated on host. It can take very long time if it kicks in direct
reclaim on the host.

The only difference is that it is *usually* slower.

I guest we can make a case for making 1 an option to match pre-populated
use case for normal VMs.

Frankly, I think option 2 is the worst one. You still CPU cycles from the
workload after boot to do the job that may or may not be needed. It is an
half-measure that helps nobody.

-- 
 Kirill A. Shutemov