Re: [PATCH] drm: mali-dp: Add check for kzalloc

Robin Murphy <robin.murphy@xxxxxxx> · Wed, 7 Dec 2022 19:23:00 +0000

On 2022-12-07 15:29, Liviu Dudau wrote:
On Wed, Dec 07, 2022 at 01:59:04PM +0000, Robin Murphy wrote:
On 2022-12-07 09:21, Jiasheng Jiang wrote:
As kzalloc may fail and return NULL pointer, it should be better to check
the return value in order to avoid the NULL pointer dereference in
__drm_atomic_helper_connector_reset.

This commit message is nonsense; if __drm_atomic_helper_connector_reset()
would dereference the NULL implied by &mw_state->base, it would equally
still dereference the explicit NULL pointer passed after this patch.

Where?

Exactly, that function already checks conn_state for NULL anyway, so any 
reasoning based on it not doing that is clearly erroneous. Even if 
something else changed in future to actually make this a bug, it still 
wouldn't strictly dereference NULL, but some small non-NULL value.

The current code works out OK because "base" is the first member of struct
malidp_mw_connector_state, thus if mw_state is NULL then &mw_state->base ==
NULL + 0 == NULL. Now you *could* argue that this isn't robust if the layout
of struct malidp_mw_connector_state ever changes, and that could be a valid
justification for making this change, but the reason given certainly isn't.

I appreciate the input and I agree with your analysis, however I don't have the same
confidence that compilers will always do the NULL + 0 math to get address of base.
Would this always work when you have authenticated pointers or is the compiler going
to generate some plumbing code that checks the pointer before doing the math?

For the current definition of struct malidp_mw_connector_state, 
&mw_state->base is equal to mw_state, that's just how C works:

"A pointer to a structure object, suitably converted, points to its 
initial member (or if that member is a bit-field, then to the unit in 
which it resides), and vice versa. There may be unnamed padding within a 
structure object, but not at its beginning."

Indeed a C compiler is technically at liberty to make checks for whether 
any pointer points to a valid object when evaluating it, but in practice 
no compiler is going to do that because it would be horrendously 
inefficient, and since the behaviour of dereferencing an invalid pointer 
is undefined, compilers are also able to simply assume all pointers are 
valid and generate good code based on that. Don't forget that there are 
several compiler optimisations that Linux actually depends on; AFAICT 
this is one of them.

Arithmetic on a (potentially) NULL pointer may well be a sign that it's
worth a closer look to check whether it really is what the code intended to
do, but don't automatically assume it has to be a bug. Otherwise, good luck
with "fixing" every user of container_of() throughout the entire kernel.

My understanding is that you're supposed to use container_of() only when you're sure
that your pointer is valid. container_of_safe() seems to be the one to use when you
don't care about NULL pointers.

I was thinking more along the lines of the "((type *)0)->member" 
expression in the definition, but fair enough, that's perhaps not the 
best example since you can argue it's an operand of typeof() which won't 
actually be evaluated. Try `git grep '&((.\+ *)\(0\|NULL\))->'` for more 
examples that will be. If none of those are going to work as intended, 
the kernel likely has bigger problems than how one driver might behave 
in OOM conditions.

Anyway, like I say I'm not objecting to the code change - even if the 
current non-bug wasn't an oversight, it's still a bit too clever for its 
own good. However, if the *justification* for making that change is 
going to go beyond "do this because static analysis suggested it", then 
it needs to explain a potential issue that actually exists and is worthy 
of fixing, not make up one that doesn't.

Cheers,
Robin.