On Wed, Apr 13, 2022 at 12:36:11PM +0200, David Hildenbrand wrote: > On 12.04.22 18:08, Dave Hansen wrote: > > On 4/12/22 01:15, David Hildenbrand wrote: > >> Can we simply automate this using a kthread or smth like that, which > >> just traverses the free page lists and accepts pages (similar, but > >> different to free page reporting)? > > > > That's definitely doable. > > > > The downside is that this will force premature consumption of physical > > memory resources that the guest may never use. That's a particular > > problem on TDX systems since there is no way for a VMM to reclaim guest > > memory short of killing the guest. > > IIRC, the hypervisor will usually effectively populate all guest RAM > either way right now. No, it is not usual. By default QEMU/KVM uses anonymous mapping and fault-in memory on demand. Yes, there's an option to pre-populate guest memory, but it is not the default. > So yes, for hypervisors that might optimize for > that, that statement would be true. But I lost track how helpful it > would be in the near future e.g., with the fd-based private guest memory > -- maybe they already optimize for delayed acceptance of memory, turning > it into delayed population. > > > > > In other words, I can see a good argument either way: > > 1. The kernel should accept everything to avoid the perf nastiness > > 2. The kernel should accept only what it needs in order to reduce memory > > use > > > > I'm kinda partial to #1 though, if I had to pick only one. > > > > The other option might be to tie this all to DEFERRED_STRUCT_PAGE_INIT. > > Have the rule that everything that gets a 'struct page' must be > > accepted. If you want to do delayed acceptance, you do it via > > DEFERRED_STRUCT_PAGE_INIT. > > That could also be an option, yes. At least being able to chose would be > good. But IIRC, DEFERRED_STRUCT_PAGE_INIT will still make the system get > stuck during boot and wait until everything was accepted. Right. It deferred page init has to be done before init. > I see the following variants: > > 1) Slow boot; after boot, all memory is already accepted. > 2) Fast boot; after boot, all memory will slowly but steadily get > accepted in the background. After a while, all memory is accepted and > can be signaled to user space. > 3) Fast boot; after boot, memory gets accepted on demand. This is what > we have in this series. > > I somehow don't quite like 3), but with deferred population in the > hypervisor, it might just make sense. Conceptionally, 3 is not different from what happens now. The first time normal VM touches the page (like on handling __GFP_ZERO) the page gets allocated on host. It can take very long time if it kicks in direct reclaim on the host. The only difference is that it is *usually* slower. I guest we can make a case for making 1 an option to match pre-populated use case for normal VMs. Frankly, I think option 2 is the worst one. You still CPU cycles from the workload after boot to do the job that may or may not be needed. It is an half-measure that helps nobody. -- Kirill A. Shutemov