Re: [PATCH 0/4] mm,memory_hotplug: allocate memmap from hotadded memory

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 3 Apr 2019 10:37:57 +0200

On Wed 03-04-19 10:17:26, David Hildenbrand wrote:
> On 03.04.19 10:12, Michal Hocko wrote:
> > On Wed 03-04-19 10:01:16, Oscar Salvador wrote:
> >> On Tue, Apr 02, 2019 at 02:48:45PM +0200, Michal Hocko wrote:
> >>> So what is going to happen when you hotadd two memblocks. The first one
> >>> holds memmaps and then you want to hotremove (not just offline) it?
> >>
> >> If you hot-add two memblocks, this means that either:
> >>
> >> a) you hot-add a 256MB-memory-device (128MB per memblock)
> >> b) you hot-add two 128MB-memory-device
> >>
> >> Either way, hot-removing only works for memory-device as a whole, so
> >> there is no problem.
> >>
> >> Vmemmaps are created per hot-added operations, this means that
> >> vmemmaps will be created for the hot-added range.
> >> And since hot-add/hot-remove operations works with the same granularity,
> >> there is no problem.
> > 
> > What does prevent calling somebody arch_add_memory for a range spanning
> > multiple memblocks from a driver directly. In other words aren't you
> 
> To drivers, we only expose add_memory() and friends. And I think this is
> a good idea.
> 
> > making  assumptions about a future usage based on the qemu usecase?
> > 
> 
> As I noted, we only have an issue if add add_memory() and
> remove_memory() is called with different granularity. I gave two
> examples where this might not be the case, but we will have to look int
> the details.

It seems natural that the DIMM will be hot remove all at once because
you cannot hot remove a half of the DIMM, right? But I can envision that
people might want to hotremove a faulty part of a really large DIMM
because they would like to save some resources.

With different users asking for the hotplug functionality, I do not
think we want to make such a strong assumption as hotremove will have
the same granularity as hotadd.

That being said it should be the caller of the hotplug code to tell
the vmemmap allocation strategy. For starter, I would only pack vmemmaps
for "regular" kernel zone memory. Movable zones should be more careful.
We can always re-evaluate later when there is a strong demand for huge
pages on movable zones but this is not the case now because those pages
are not really movable in practice.
-- 
Michal Hocko
SUSE Labs