Hi Linus,
On 2024/8/22 14:40, Linus Torvalds wrote:
On Thu, 22 Aug 2024 at 14:21, Michal Hocko <mhocko@xxxxxxxx> wrote:
The reality disagrees because there is a real demand for real GFP_NOFAIL
semantic. By that I do not mean arbitrary requests and sure GFP_NOFAIL
for higher orders is really hard to achieve but kvmalloc GFP_NOFAIL for
anything larger than PAGE_SIZE is doable without a considerable burden
on the MM end.
Doable? Sure. Sensible? Not clear.
I do not find a single case of that in the kernel.
I did find three cases of kvcalloc(NOFAIL) in the nouveau driver and
one in erofs. It's not clear that any of them make much sense (or that
the erofs one is actually a large allocation).
I don't follow all the thread due to other internal work ongoing
but EROFS could do _large_ kvmalloc NOFAIL allocation according to
PAGE_ALLOC_COSTLY_ORDER (~24kb at most due to on-disk restriction),
my detailed story was outlined in my previous reply (and thread):
https://lore.kernel.org/r/20d782ad-c059-4029-9c75-0ef278c98d81@xxxxxxxxxxxxxxxxx
Because EROFS needs page arraies for vmap and then do decompression,
for the worst case, it almost needs ~24kb temporary page array
but that is the end user choice to use such extreme compression
(mostly just syzkallar crafted images.)
In my opinion, I'm not sure how PAGE_ALLOC_COSTLY_ORDER restriction
means for a single shot. Because assume even if you don't consider
a virtual consecutive buffer, people could also do
< PAGE_ALLOC_COSTLY_ORDER allocations multiple times to get almost
the same heavy workload to the whole system. And we also allow
direct/kswap reclaim here.
Failure path is complex in some cases like here and it's hard
to reach or get it right. If kvmalloc() will be restricted on
< PAGE_ALLOC_COSTLY_ORDER anyway, I guess I will use a global
static buffer (and a sleeping lock) as a worst fallback to fulfill
the extreme on-disk restriction.
Thanks,
Gao Xiang