On Tue, Apr 14, 2020 at 08:07:32AM -0700, Alexander Duyck wrote: > On Tue, Apr 14, 2020 at 5:01 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > Having that said, I agree with Dave here, that there might be better > > alternatives for this somewhat-special-case. > > I wonder if it wouldn't make more sense to look at the option of > splitting the initialization work up over multiple CPUs instead of > leaving it all single threaded. The data above was creating a VM with > 64GB of RAM and 32 CPUs. How fast could we zero the pages if we were > performing the zeroing over those 32 CPUs? I wonder if we couldn't > look at recruiting other CPUs on the same node to perform the zeroing > like what Dan had originally proposed for ZONE_DEVICE initialization a > couple years ago[1]. This is exactly what I've done for VFIO. Some performance results: https://lore.kernel.org/linux-mm/20181105165558.11698-10-daniel.m.jordan@xxxxxxxxxx/ and a semi-current branch is here if anyone wants to test it: https://lore.kernel.org/linux-mm/20200212224731.kmss6o6agekkg3mw@xxxxxxxxxxxxxxxxxxxxxxxxxx/ One of the issues with starting extra threads for paths triggered from userspace such as VFIO is that they need to be properly throttled by relevant resource controls such as cgroup (CPU controller especially) and sched_setafffinity. This type of control for kernel threads has another use case too, async memcg reclaim. All this is second on my list after I post a series that multithreads deferred page init and sets up the basic infrastructure for multithreading other paths, which I hope will be ready soon. > [1]: https://lore.kernel.org/linux-mm/153077336359.40830.13007326947037437465.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ I haven't looked closely at memmap_init_zone, though I've tried memmap_init_zone_device. Will take a closer look to see how well this could be incorporated. Daniel