Re: [PATCH v3 0/4] mm: clarify nofail memory allocation

David Hildenbrand <david@xxxxxxxxxx> · Thu, 22 Aug 2024 10:39:09 +0200

On 22.08.24 10:24, Michal Hocko wrote:
On Thu 22-08-24 19:57:41, Barry Song wrote:
Regarding the concern about 'leaving locks
behind' you have in that subthread,  I believe there's no difference
when returning NULL, as it could still leave locks behind but offers
a chance for the calling process to avoid an immediate crash.

Yes, I have mentioned this risk just for completeness. Without having
some sort of unwinding mechanism we are doomed to not be able to handle
this.

The sole difference between just returning NULL and OOPsing rigth away
is that the former is not guaranteed to happen and the caller can cause
an actual harm by derefering non-oopsing addressed close to 0 which
would be a) much harder to find out b) could cause much more damage than
killing the context right away.

Besides that I believe we have many BUG_ON users which would really
prefer to just call the current context instead, they just do not have
means to do that so OOPS_ON could be a safer way to stop bad users and
reduce the number of BUG_ONs as well.

To me that sounds better as well, but I was also wondering if it's easy 
to implement or easy to assemble from existing pieces.

Linus has a point that "retry forever" can also be nasty. I think the 
important part here is, though, that we report sufficient information 
(stacktrace), such that the problem can be debugged reasonably well, and 
not just having a locked-up system.

But then the question is: does it really make sense to differentiate 
difference between an NOFAIL allocation under memory pressure of 
MAX_ORDER compared to MAX_ORDER+1 (Linus also touched on that)? It could 
well take minutes/hours/days to satisfy a very large NOFAIL allocation. 
So callers should be prepared to run into effective lockups ... :/

NOFAIL shouldn't exist, or at least not used to that degree.

I am to blame myself, I made use of it in kernel/resource.c, where there 
is no turning back when completed memory unplug to 99% (even having 
freed the vmemmap), but then we might have to allocate a new node in the 
resource tree, when having to split an existing one. Maybe there would 
be ways to preallocate before starting memory unplug, or to pre-split ...

But then again, sizeof(struct resource) is probably so small that it 
likely would never fail.

--
Cheers,

David / dhildenb