On 17.08.24 08:24, Barry Song wrote:
From: Barry Song <v-songbaohua@xxxxxxxx>
-v3:
* collect reviewed-by, acked-by etc. Michal, Christoph, Vlastimil, Davidlohr,
thanks!
* use Jason's patch[1] to fix vdpa and refine his changelog.
* refine changelogs
[1] https://lore.kernel.org/all/20240808054320.10017-1-jasowang@xxxxxxxxxx/
-v2:
https://lore.kernel.org/linux-mm/20240731000155.109583-1-21cnbao@xxxxxxxxx/
* adjust vpda fix according to Jason and Michal's feedback, I would
expect Jason to test it, thanks!
* split BUG_ON of unavoidable failure and the case GFP_ATOMIC |
__GFP_NOFAIL into two patches according to Vlastimil and Michal.
* collect Michal's acked-by for patch 2 - the doc;
* remove the new GFP_NOFAIL from this series, that one would be a
separate enhancement patchset later on.
-v1:
https://lore.kernel.org/linux-mm/20240724085544.299090-1-21cnbao@xxxxxxxxx/
__GFP_NOFAIL carries the semantics of never failing, so its callers
do not check the return value:
%__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
cannot handle allocation failures. The allocation could block
indefinitely but will never return with failure. Testing for
failure is pointless.
You should have called this series "mm: clarify nofail and BUG_ON on
with invalid arguments with nofail"
I was a bit surprised to find that many BUG_ON in this series ;)
However, __GFP_NOFAIL can sometimes fail if it exceeds size limits
or is used with GFP_ATOMIC/GFP_NOWAIT in a non-sleepable context.
This can expose security vulnerabilities due to potential NULL
dereferences.
Which code is that that we are concerned about? Some OOT driver? Some
untested in-tree driver? Just trying to understand what we are afraid
about here. I'm afraid we cannot safe the world with some OOT drivers :(
Are we aware of such security vulnerabilities (out of pure interest!)?
Is the expectation that these code paths are never triggered during
early testing where a WARN_ON_ONCE() would just highlight -- in addition
to a likely crash -- that something is very wrong?
Since __GFP_NOFAIL does not support non-blocking allocation, we introduce
GFP_NOFAIL with inclusive blocking semantics and encourage using GFP_NOFAIL
as a replacement for __GFP_NOFAIL in non-mm.
I still wonder if specifying GFP_NOFAIL should not simply come with the
semantics "implies direct reclaim. If called for non-blocking
allocations, bad things will happen (no security bad things though)." ;)
If we must still fail a nofail allocation, we should trigger a BUG rather
than exposing NULL dereferences to callers who do not check the return
value.
I am not convinced that BUG_ON is the right tool here to save the world,
but I see how we arrived here.
--
Cheers,
David / dhildenb