Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx> · Sun, 30 Oct 2016 19:22:01 +0000

On 10/28/16 19:08, James Bottomley wrote:
> This is a deadlock caused by an inversion issue in kernfs (suicide vs
> non-suicide removes); so fixing it in SCSI alone really isn't
> appropriate.  I count at least five other subsystems all using this
> mechanism, so they'll all be similarly affected.  It looks to be fairly
> simply fixable inside kernfs, so please fix it that way.

Hello James,

Can you clarify this further? To me this looks like the result of how 
the SCSI core works rather than an issue in the kernfs layer. My 
interpretation of the deadlock report produced by the lockdep code is as 
follows:
* The SCSI scanning code holds scan_mutex while creating sysfs
   attributes for a SCSI device. In this case scan_mutex is the outer
   mutex and s_active the inner locking object.
* scsi_remove_host() holds scan_mutex while removing sysfs attributes.
   Also in this case scan_mutex is the outer mutex and s_active the
   inner locking object.
* During self-removal (sysfs_remove_file_self() being called indirectly
   by kernfs_fop_write()), kernfs_fop_write() holds s_active while
   scsi_remove_device() is being called. In this case s_active is the
   outer locking object and scan_mutex the inner locking object.

I think that it is essential that kernfs_fop_write() holds s_active. So 
to me this looks like a lock inversion issue that cannot be fixed by 
modifying kernfs only. In other words, the SCSI core has to be modified 
to fix this. Do you agree with this?

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html