Re: [PATCH] mm: fix use-after free of page_ext after race with memory-offline

Michal Hocko <mhocko@xxxxxxxx> · Tue, 19 Jul 2022 17:37:14 +0200

On Tue 19-07-22 17:19:34, David Hildenbrand wrote:
> On 18.07.22 16:54, Michal Hocko wrote:
> > On Mon 18-07-22 19:28:13, Charan Teja Kalla wrote:
[...]
> >>> 3) Change the design where the page_ext is valid as long as the struct
> >>> page is alive.
> >>
> >> :/ Doesn't spark joy."
> > 
> > I would be wondering why. It should only take to move the callback to
> > happen at hotremove. So it shouldn't be very involved of a change. I can
> > imagine somebody would be relying on releasing resources when offlining
> > memory but is that really the case?
> 
> Various reasons:
> 
> 1) There was a discussion in the past to eventually also use rcu
> protection for handling pdn_to_online_page(). So doing it cleanly here
> is certainly an improvement.

Call me skeptical on that.

> 2) I really dislike having to scatter section online checks all over the
> place in page ext code. Once there is a difference between active vs.
> stale page ext data things get a bit messy and error prone. This is
> already ugly enough in our generic memmap handling code IMHO.

They should represent a free page in any case so even they are stall
they shouldn't be really dangerous, right? 

> 3) Having on-demand allocations, such as KASAN or page ext from the
> memory online notifier is at least currently cleaner, because we don't
> have to handle each and every subsystem that hooks into that during the
> core memory hotadd/remove phase, which primarily only setups the
> vmemmap, direct map and memory block devices.

Cannot this hook into __add_pages which is the real implementation of
the arch independent way to allocate vmemmap. Or at the sparsemem level
because we do not (and very likely won't) support memory hotplug on
any other memory model.

> Personally, I think what we have in this patch is quite nice and clean.
> But I won't object if it can be similarly done in a clean way from
> hot(un)plug code.

Well, if the scheme can be done without synchronize_rcu for each section
which can backfire and if the scheme doesn't add too much complexity to
achieve that then sure I won't object. I just do not get why page_ext
should have a different allocation lifetime expectancy than a real page.
Quite confusing if you ask me.
-- 
Michal Hocko
SUSE Labs