Re: [PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Fri, 8 Sep 2023 16:31:36 +0200

On 9/8/23 13:13, Thomas Hellström wrote:

On 9/8/23 11:14, Christian König wrote:
Am 08.09.23 um 11:04 schrieb Thomas Hellström:

On 9/8/23 10:52, Christian König wrote:
Am 08.09.23 um 09:37 schrieb Thomas Hellström:
Hi,

On 9/7/23 16:49, Christian König wrote:
Am 07.09.23 um 16:47 schrieb Thomas Hellström:
Hi,

On 9/7/23 16:37, Christian König wrote:
Am 07.09.23 um 15:53 schrieb Thomas Hellström:
While trying to replicate a weird drm_exec lock alloc tracking 
warning
using the drm_exec kunit test, the warning was shadowed by a 
UAF warning
from KASAN due to a bug in the drm kunit helpers.

Patch 1 fixes that drm kunit UAF.
Patch 2 introduces a drm_exec kunit subtest that fails if the 
conditions
       for the weird warning are met.

The series previously also had a patch with a drm_exec 
workaround for the
warning but that patch has already been commited to 
drm_misc_next_fixes.

Thinking more about this what happens when somebody calls 
drm_exec_unlock_obj() on the first locked object?

Essentially the same thing. I've been thinking of the best way 
to handle that, but not sure what's the best one.

Well what does lockdep store in that object in the first place? 
Could we fix that somehow?

Lockdep maintains an array of held locks (lock classes) for each 
task. Upon freeing, that list is traversed to see if the address 
matches the stored memory address. This also has the interesting 
side effect that IICR dma_resv_assert_held() checks if *any* 
dma_resv is held....

Ideally each object would have its own class instance, but I think 
some applications would then exhaust the array size.

IIRC Daniel once explained to me that he designed lockdep for 
ww_mutexes like this for some reason, but I don't remember the 
details any more.

Maybe lockdep wouldn't otherwise be able to deal with the fact that 
you could lock them in any order or something like that.

Oh, that's well handled with the mutex_lock_nest_lock()  type of 
annotation that's used for WW mutexes. IIRC the problem is that 
lockdep can't really deal with either that vast number of locks 
overall or the vast number of held locks per process.

Could we somehow teach lockdep that multiple locks of a lock class 
can be held at the same time? E.g. like a reference count in the 
lockclass or something like that?

I'll dig a bit deeper into this.

Meanwhile for the unlock problem, looking at how the unlocks are 
used in i915 it's typically locks that are grabbed during eviction 
and released again once validation of a single object succeeded. 
The risk of them ending up at the first lock is small, unless they 
are prelocked as the contended lock. But for these "temporary" 
objects, the prelocked lock is immediately dropped after locking 
and are only used to find something suitable to wait for to relax 
the ww transaction.

Yeah, I don't see this as an use case in reality. It's more of a 
"what if?" thing.

Oh, it's a real use-case. As soon as you start having sleeping locks 
for eviction you hit it, in particular with WW mutex slowpath 
debugging. And we will need to work on improving TTM support for 
that for xe.

Oh, good point! When we have contention on a lock, rollback and take 
that lock then first it can be that this lock then needs to be 
unlocked again. Unlikely, but certainly possible.

Sounds like we really need to fix this in lockdep then.

So it seems lockdep *does* reference counting in this case, but stores 
the address of the first locked lockdep map, and then subsequently 
uses it for various things. In short freeing the first lock isn't 
something lockdep thinks you should do. Ever.

The good thing about this is that this refcounting appears only done 
on nest locks, that is, when we have a ww context AFAICT. That means 
we can probably store a fake ww_mutex lockdep map with the ww acquire 
context and lock it when we initialize the context and unlock it on 
ww_acquire_fini().

Should take care of the problem I think, although the problem of 
lockdep_assert() and lock freeing granularity will remain. It looks 
like there is a comparison function one can optionally set to make 
different objects look separate to lockdep. Probably something to 
think of for enhanced debugging with a limited set of locked objects.

Need to also check what happens if we do a sequence of successful 
trylocks.

OK, nested trylocks indeed seem to store one instance per lock, so not 
prone to the problem.

For locks under a ww_acquire_ctx, the solution outlined above appears to 
work, and it's restricted to lockdep code only.

/Thomas

/Thomas

Christian.

If we were to implement something similar in drm_exec, we'd need 
an interface to mark an object as "temporary" when locking, and 
make sure we drop those objects if they end up as "prelocked". 
Personally I think this solution works well and would be my 
preferred choice.

Yet another alternative would be to keep a reference even of the 
unlocked objects...

But these workarounds ofc only push the problem out of drm_exec. 
Users of raw dma-resv or ww mutexes would still wonder what's 
going on.

Agree, completely. This is really a bug in lockdep or rather how we 
designed to implement ww_mutexes in lockdep and should therefore be 
fixed there I think.

Christian.

/Thomas

Christian.

/Thomas

Christian.

v2:
- Rewording of commit messages
- Add some commit message tags
v3:
- Remove an already committed patch
- Rework the test to not require dmesg inspection (Maxime Ripard)
- Condition the test on CONFIG_LOCK_ALLOC
- Update code comments and commit messages (Maxime Ripard)

Cc: Maxime Ripard <mripard@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>

Thomas Hellström (2):
   drm/tests: helpers: Avoid a driver uaf
   drm/tests/drm_exec: Add a test for object freeing within
     drm_exec_fini()

  drivers/gpu/drm/tests/drm_exec_test.c | 82 
+++++++++++++++++++++++++++
  include/drm/drm_kunit_helpers.h       |  4 +-
  2 files changed, 85 insertions(+), 1 deletion(-)