The patch titled Subject: ocfs2/dlm: clean DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB when the cancel_pending is set has been added to the -mm tree. Its filename is ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: wangjian <wangjian161@xxxxxxxxxx> Subject: ocfs2/dlm: clean DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB when the cancel_pending is set dlm_move_lockres_to_recovery_list() should clean DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB when the cancel_pending is set. Otherwise node may panic in dlm_proxy_ast_handler. Here is the situation: At the beginning, Node1 is the master of the lock resource and has NL lock, Node2 has PR lock, Node3 has PR lock, Node4 has NL lock. Node1 Node2 Node3 Node4 convert lock_2 from PR to EX. the mode of lock_3 is PR, which blocks the conversion request of Node2. move lock_2 to conversion list. convert lock_3 from PR to EX. move lock_3 to conversion list. send BAST to Node3. receive BAST from Node1. downconvert thread execute canceling convert operation. Node2 dies because the host is powered down. in dlmunlock_common function, the downconvert thread set cancel_pending. at the same time, Node 3 realized that Node 1 is dead, so move lock_3 back to granted list in dlm_move_lockres_to_recovery_list function and remove Node 1 from the domain_map in __dlm_hb_node_down function. then downconvert thread failed to send the lock cancellation request to Node1 and return DLM_NORMAL from dlm_send_remote_unlock_request function. become recovery master. during the recovery process, send lock_2 that is converting form PR to EX to Node4. during the recovery process, send lock_3 in the granted list and cantain the DLM_LKSB_GET_LVB flag to Node4. Then downconvert thread delete DLM_LKSB_GET_LVB flag in dlmunlock_common function. Node4 finish recovery. the mode of lock_3 is PR, which blocks the conversion request of Node2, so send BAST to Node3. receive BAST from Node4. convert lock_3 from PR to NL. change the mode of lock_3 from PR to NL and send message to Node3. receive message from Node4. The message contain LKM_GET_LVB flag, but the lock->lksb->flags does not contain DLM_LKSB_GET_LVB, BUG_ON in dlm_proxy_ast_handler function. Link: http://lkml.kernel.org/r/bffbe5a5-acb6-652d-eb8a-99fb051e6631@xxxxxxxxxx Signed-off-by: Jian Wang <wangjian161@xxxxxxxxxx> Reviewed-by: Yiwen Jiang <jiangyiwen@xxxxxxxxxx> Cc: Mark Fasheh <mark@xxxxxxxxxx> Cc: Joel Becker <jlbec@xxxxxxxxxxxx> Cc: Junxiao Bi <junxiao.bi@xxxxxxxxxx> Cc: Joseph Qi <jiangqi903@xxxxxxxxx> Cc: Changwei Ge <ge.changwei@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/ocfs2/dlm/dlmunlock.c | 2 ++ 1 file changed, 2 insertions(+) --- a/fs/ocfs2/dlm/dlmunlock.c~ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set +++ a/fs/ocfs2/dlm/dlmunlock.c @@ -229,6 +229,7 @@ static enum dlm_status dlmunlock_common( mlog(0, "clearing convert_type at %smaster node\n", master_node ? "" : "non-"); lock->ml.convert_type = LKM_IVMODE; + lock->lksb->flags &= ~(DLM_LKSB_GET_LVB|DLM_LKSB_PUT_LVB); } /* remove the extra ref on lock */ @@ -277,6 +278,7 @@ void dlm_commit_pending_cancel(struct dl { list_move_tail(&lock->list, &res->granted); lock->ml.convert_type = LKM_IVMODE; + lock->lksb->flags &= ~(DLM_LKSB_GET_LVB|DLM_LKSB_PUT_LVB); } _ Patches currently in -mm which might be from wangjian161@xxxxxxxxxx are ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch