Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

Michal Hocko <mhocko@xxxxxxxx> · Tue, 5 Jan 2021 10:20:37 +0100

On Mon 04-01-21 15:00:31, Dave Hansen wrote:
> On 1/4/21 12:11 PM, David Hildenbrand wrote:
> >> Yeah, it certainly can't be the default, but it *is* useful for
> >> thing where we know that there are no cache benefits to zeroing
> >> close to where the memory is allocated.
> >> 
> >> The trick is opting into it somehow, either in a process or a VMA.
> >> 
> > The patch set is mostly trying to optimize starting a new process. So
> > process/vma doesn‘t really work.
> 
> Let's say you have a system-wide tunable that says: pre-zero pages and
> keep 10GB of them around.  Then, you opt-in a process to being allowed
> to dip into that pool with a process-wide flag or an madvise() call.
> You could even have the flag be inherited across execve() if you wanted
> to have helper apps be able to set the policy and access the pool like
> how numactl works.

While possible, it sounds quite heavy weight to me. Page allocator would
have to somehow maintain those pre-zeroed pages. This pool will also
become a very scarce resource very soon because everybody just want to
run faster. So this would open many more interesting questions.

A global knob with all or nothing sounds like an easier to use and
maintain solution to me.

> Dan makes a very good point about using filesystems for this, though.
> It wouldn't be rocket science to set up a special tmpfs mount just for
> VM memory and pre-zero it from userspace.  For qemu, you'd need to teach
> the management layer to hand out zeroed files via mem-path=.

Agreed. That would be an interesting option.

> Heck, if
> you taught MADV_FREE how to handle tmpfs, you could even pre-zero *and*
> get the memory back quickly if those files ended up over-sized somehow.

We can probably allow MADV_FREE on shmem but that would require an
exclusively mapped page. Shared case is really tricky because of silent
data corruption in uncoordinated userspace.
-- 
Michal Hocko
SUSE Labs