Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Tue, 21 Sep 2021 08:48:49 -0700

On Tue, Sep 21, 2021 at 1:24 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> From: Luis Chamberlain
> > Sent: 17 September 2021 20:47
> >
> > When sysfs attributes use a lock also used on module removal we can
> > race to deadlock. This happens when for instance a sysfs file on
> > a driver is used, then at the same time we have module removal call
> > trigger. The module removal call code holds a lock, and then the sysfs
> > file entry waits for the same lock. While holding the lock the module
> > removal tries to remove the sysfs entries, but these cannot be removed
> > yet as one is waiting for a lock. This won't complete as the lock is
> > already held. Likewise module removal cannot complete, and so we deadlock.
>
> Isn't the real problem the race between a sysfs file action and the
> removal of the sysfs node?

Nope, that is taken care of by kernfs.

> This isn't really related to module unload - except that may
> well remove some sysfs nodes.

Nope, the issue is a deadlock that can happen due to a shared lock on
module removal and a driver sysfs operation.

> This is the same problem as removing any other kind of driver callback.
> There are three basic solutions:
> 1) Use a global lock - not usually useful.
> 2) Have the remove call sleep until any callbacks are complete.
> 3) Have the remove just request removal and have a final
>    callback (from a different context).

Kernfs already does a sort of combination of 1) and 2) but 1) is using
atomic reference counts.

> If the remove can sleep (as in 2) then there is a requirement
> on the driver code to not hold any locks across the 'remove'
> that can be acquired during the callbacks.

And this is the part that kernfs has no control over since the removal
and sysfs operation are implementation specific.

> Now, for sysfs, you probably only want to sleep the remove code
> while a read/write is in progress - not just because the node
> is open.
> That probably requires marking an open node 'invalid' and
> deferring delete to close.

This is already done by kernfs.

> None of this requires a reference count on the module.

You are missing the point to the other aspect of the try_module_get(),
it lets you also check if module exit has been entered. By using
try_module_get() you let the module exit trump proceeding with an
operation, therefore also preventing any potential use of a shared
lock on module exit and the driver specific sysfs operation.

  Luis