On Tue, 11 Jul 2023 17:47:50 -0700 Ziqi Zhao <astrajoan@xxxxxxxxx> wrote: > The following 3 locks would race against each other, causing the > deadlock situation in the Syzbot bug report: > > - j1939_socks_lock > - active_session_list_lock > - sk_session_queue_lock > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > the rare situations where a write lock is required for the linked list > that j1939_socks_lock is protecting, the code does not attempt to > acquire any more locks. This would break the circular lock dependency, > where, for example, the current thread already locks j1939_socks_lock > and attempts to acquire sk_session_queue_lock, and at the same time, > another thread attempts to acquire j1939_socks_lock while holding > sk_session_queue_lock. > > NOTE: This patch along does not fix the unregister_netdevice bug > reported by Syzbot; instead, it solves a deadlock situation to prepare > for one or more further patches to actually fix the Syzbot bug, which > appears to be a reference counting problem within the j1939 codebase. > > #syz test: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > Signed-off-by: Ziqi Zhao <astrajoan@xxxxxxxxx> > --- Reader-writer locks are not the best way to fix a lock hierarchy problem. Instead either fix the lock ordering, or use RCU. Other devices don't have this problem, so perhaps the unique locking in this device is the problem.