On Tue, 2018-01-30 at 19:21 +0800, Joseph Qi wrote: > Hi Jens and Folks, > > Recently we've gotten a hard LOCKUP issue. After investigating the issue > we've found a race between blk_init_queue_node and blkcg_print_blkgs. > The race is described below. > > blk_init_queue_node blkcg_print_blkgs > blk_alloc_queue_node (1) > q->queue_lock = &q->__queue_lock (2) > blkcg_init_queue(q) (3) > spin_lock_irq(blkg->q->queue_lock) (4) > q->queue_lock = lock (5) > spin_unlock_irq(blkg->q->queue_lock) (6) > > (1) allocate an uninitialized queue; > (2) initialize queue_lock to its default internal lock; > (3) initialize blkcg part of request queue, which will create blkg and > then insert it to blkg_list; > (4) traverse blkg_list and find the created blkg, and then take its > queue lock, here it is the default *internal lock*; > (5) *race window*, now queue_lock is overridden with *driver specified > lock*; > (6) now unlock *driver specified lock*, not the locked *internal lock*, > unlock balance breaks. > > For the issue above, I think blkcg_init_queue is a bit earlier. We > can't allow a further use before request queue is fully initialized. > Since blk_init_queue_node is a really common path and it allows driver > to override the default internal lock, I'm afraid several other places > may also have the same issue. > Am I missing something here? What code triggered the blkcg_print_blkgs() call? Was it perhaps a function related to throttling? If so, I think there are two possible ways to fix this race: 1. Moving the .queue_lock initialization from blk_init_queue_node() into blk_alloc_queue_node() such that it happens before the blkcg_init_queue() call. This however will require a tree-wide change of the blk_alloc_queue() and all blk_alloc_queue_node() calls in all block drivers. 2. Splitting the blk_throtl_init() function such that the blkcg_activate_policy() call can occur after the .queue_lock pointer has been initialized, e.g. from inside blk_init_allocated_queue() or from inside blk_register_queue() instead of from inside blkcg_init_queue(). I'm not sure what the best approach is. Maybe there is even a third approach that is better than the two approaches mentioned above. Bart.