On Sun, 2016-10-30 at 19:22 +0000, Bart Van Assche wrote: > On 10/28/16 19:08, James Bottomley wrote: > > This is a deadlock caused by an inversion issue in kernfs (suicide > > vs > > non-suicide removes); so fixing it in SCSI alone really isn't > > appropriate. I count at least five other subsystems all using this > > mechanism, so they'll all be similarly affected. It looks to be > > fairly > > simply fixable inside kernfs, so please fix it that way. > > Hello James, > > Can you clarify this further? To me this looks like the result of how > the SCSI core works rather than an issue in the kernfs layer. I'm at a bit of a loss, the problem looks clear from the original trace, so I'm not really sure what's not clear to you. The inversion is between the scan mutex and s_active which is the rather fanciful name Tejun gave to the hand rolled mutex in kernfs_node. The reason for the inversion is that s_active is taken when you open a sysfs file, including the delete one. There's a special suidice path to allow that file to be deleted while something else holds the lock. However, if the delete path also takes any lock, and there's a way to get into delete not via writing to sysfs (which is pretty much universally true) then you get an inversion because kernfs_node mutex is also taken when the file is removed, which is why it's not specific to scsi. Since you press the issue, I've got to say I'm not a huge fan of trying to escape from a lock inversion by making some path asynchronous because it usually leads to even more problems on down the road. If there's some problem with the generic fix, there is a way of fixing this in SCSI without introducing asynchronicity. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html