> > > Hate to even bring this up, but there are complaints today about 'allocation > > > time' of 1GB pages from the hugetlb pool. This 'allocation time' is actually > > > the time it takes to clear/zero 1G of memory. Only reason I mention is > > > using something like CMA to allocate 1G pages (at fault time) may add > > > unacceptable latency. > > > > One solution I had in mind is that you could zero these 1GB pages at free > > time in a worker thread, so that you do not pay the penalty at page allocation > > time. But it would not work if the allocation comes right after a page is > > freed. > > In addition, there were several proposals to speed zeroing of huge pages: > > 1. X86 specific: Cannon Matthews proposed "clear 1G pages with > streaming stores on x86" change. > https://lore.kernel.org/linux-mm/20200307010353.172991-1-cannonmatthews@xxxxxxxxxx > > This speeds up setting up 1G pages by roughly 4 times. > > 2. X86 specific: Kirill and Andi proposed also proposed a similar > change even earlier: > https://lore.kernel.org/all/1345470757-12005-1-git-send-email-kirill.shutemov@xxxxxxxxxxxxxxx Also, this one more recently from me: https://lore.kernel.org/all/20220606202109.1306034-1-ankur.a.arora@xxxxxxxxxx/ Linus had some comments on the overall approach and I had sent out this as follow-up: https://lore.kernel.org/all/20230403052233.1880567-1-ankur.a.arora@xxxxxxxxxx/ > 3. Arch Generic: Ktasks https://lwn.net/Articles/770826 > That allows zeroing HugeTLB pages in Parallel. > > 4. VM Specific: https://lwn.net/Articles/931933/ > Allows to lazyly zero 1G pages in the guest. > > I looked through the (1) proposal and did not see any major pushbacks, > I do not see why movnti can't be used specifically for gigantic pages. AFAICT, the recent concerns are mostly around proper API, and in encapsulating MOVNTI like primitives such that they can be safely used. Ankur