On 03/12/2024 12:06, Michal Hocko wrote: > On Mon 02-12-24 14:50:49, Frank van der Linden wrote: >> On Mon, Dec 2, 2024 at 1:58 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: >>> Any games with "background zeroing" are notoriously crappy and I would >>> argue one should exhaust other avenues before going there -- at the end >>> of the day the cost of zeroing will have to get paid. >> >> I understand that the concept of background prezeroing has been, and >> will be, met with some resistance. But, do you have any specific >> concerns with the patch I posted? It's pretty well isolated from the >> rest of the code, and optional. > > The biggest concern I have is that the overhead is payed by everybody on > the system - it is considered to be a system overhead regardless only > part of the workload benefits from hugetlb pages. In other words the > workload using those pages is not accounted for the use completely. > > If the startup latency is a real problem is there a way to workaround > that in the userspace by preallocating hugetlb pages ahead of time > before those VMs are launched and hand over already pre-allocated pages? It should be relatively simple to actually do this. Me and Mike had experimented ourselves a couple years back but we never had the chance to send it over. IIRC if we: - add the PageZeroed tracking bit when a page is zeroed - clear it in the write (fixup/non-fixup) fault-path [somewhat similar to this series I suspect] Then what's left is to change the lookup of free hugetlb pages (dequeue_hugetlb_folio_node_exact() I think) to search first for non-zeroed pages. Provided we don't track its 'cleared' state, there's no UAPI change in behaviour. A daemon can just allocate/mmap+touch/etc them with read-only and free them back 'as zeroed' to implement a userspace scrubber. And in principle existing apps should see no difference. The amount of changes is consequently significantly smaller (or it looked as such in a quick PoC years back). Something extra on the top would perhaps be the ability so select a lookup heuristic such that we can pick the search method of non-zero-first/only-nonzero/zeroed pages behind ioctl() (or a better generic UAPI) to allow a scrubber to easily coexist with hugepage user (e.g. a VMM, etc) without too much of a dance. Joao