On 9/17/21 10:04 PM, Luis Chamberlain wrote:
A sketch of how this can happen follows: CPU A CPU B whatever_store() module_unload mutex_lock(foo) mutex_lock(foo) del_gendisk(zram->disk); device_del() device_remove_groups() In this situation whatever_store() is waiting for the mutex foo to become unlocked, but that won't happen until module removal is complete. But module removal won't complete until the sysfs file being poked completes which is waiting for a lock already held.
If I remember correctly I encountered the deadlock scenario described above for the first time about ten years ago while working on the SCST project. We solved this deadlock by removing the sysfs attributes from the module unload code before grabbing mutex_lock(foo), e.g. by calling sysfs_remove_file(). This works because calling sysfs_remove_file() multiple times in a row is safe. Is that solution good enough for the zram driver? Thanks, Bart.