Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

Christian König <christian.koenig@xxxxxxx> · Fri, 7 Aug 2020 11:34:13 +0200

Am 07.08.20 um 11:23 schrieb Daniel Vetter:
On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
On Fri, Aug 7, 2020 at 11:08 AM Christian König
<christian.koenig@xxxxxxx> wrote:
[SNIP]
What we should do instead is to make sure we have only a single lock for the complete hive instead.
[Dennis Li] If we use a single lock, users will must wait for all devices resuming successfully, but in fact their tasks are only running in device a. It is not friendly to users.
Well that is intentional I would say. We can only restart submissions after all devices are resumed successfully, cause otherwise it can happen that a job on device A depends on memory on device B which isn't accessible.
[Dennis Li] Yes, you are right. Driver have make sure that the shared resources(such as the shard memory) are ready before unlock the lock of adev one by one. But each device still has private hardware resources such as video  and display.
Yeah, but those are negligible and we have a team working on display support for XGMI. So this will sooner or later become a problem as well.

I still think that a single rwsem for the whole hive is still the best option here.

[Dennis Li] For your above concern, we can open a new thread to discuss it. If we make a decision to use a single after discussing, we can create another patch for it.
That's a really good argument, but I still hesitate to merge this patch.
How severe is the lockdep splat?

At bare minimum we need a "/* TODO: ...." comment why we do this and how to remove the workaround again.
[Dennis Li] It is not a workaround. According to design of lockdep, each rw_semaphore should has a separated lock_class_key. I have explained that init_rwsem has the above limitation, so we must correct it.
Yeah, but this explanation is not really sufficient. Again this is not
an limitation of init_rwsem, but intentional behavior.

In other words in most cases lockdep should splat if you use rw
semaphores like this.

That we avoid this by locking multiple amdgpu device always in the same
order is rather questionable driver design.
Yeah don't give multiple locking classes like that, that's fairly
terrible.

Ok, that is exactly the answer I feared we would get.

The right fix is mutex_lock_nest_lock, which we're using in
ww_mutex or in which is also used in mm_take_all_locks.

Ah! Enlightenment! So in this case we should use down_write_nested() in 
this special space and all is well?

If you solve the locking ordering by sorting all the locks you need
(ugh), then you can also just use a fake lockdep_map for your special
function only.

Also in general the idea of an rwsem in conjunction with xgmi cross
device memory handling just plain freaks me out :-) Please make sure
you prime this lock against dma_resv and dma_fence (and maybe also
drm_modeset_lock if this lock is outside of dma_resv), and ideally
this should only ever be used for setup stuff when loading drivers. I
guess that's why you went with a per-device rwsem. If it's really only
for setup then just do the simple thing of having an
xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
fancy lock ordering or other tricks.

Well this is for the group reset of all devices in a hive and you need 
to reboot to change the order the device are in the list.

Regards,
Christian.

   Core network driver (net/core/dev.c) has the similar use case with us.
Just took a look at that, but this is completely different use case. The
lockdep classes are grouped by driver type to note the difference in the
network stack.

A quick search showed that no other device driver is doing this in the
kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding
Daniel to comment on this as well.

Felix had some really good arguments to make that an urgent fix, so
either we come up with some real rework of this or at least add a big
"/* TODO....*/".
Aside from "I have questions" I don't think there's any reason to no
fix this correctly with mutex_lock_nest_lock. Really shouldn't be a
semantic change from what I'm understand here.
Also one more with my drm maintainer hat on, as a heads-up: Dave&me
looked at i915 gem code a bit too much, and we're appalled at the
massive over-use of lockdep class hacks and dubious nesting
annotations. Expect crack down on anyone who tries to play clever
tricks here, we have a few too many already :-)

So no mutex_lock_nested (totally different beast from
mutex_lock_nest_lock, and really unsafe since it's a runtime change of
the lockdep class) and also not any lockdep_set_class trickery or
anything like that. We need locking hierarchies which mere humans can
comprehend and review.

Cheers, Daniel

Cheers, Daniel

Regards,
Christian.

--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Cfa969d301e9f4a98c05608d83ab3a0f8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637323890463734977&amp;sdata=dCB0oOQ9hniHqB%2FEUZ15r6lZNOSRYbRp9pXNhL7SVDM%3D&amp;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx