The patch titled Subject: ocfs2: fix deadlock between o2hb thread and o2net_wq has been added to the -mm tree. Its filename is ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Joseph Qi <joseph.qi@xxxxxxxxxx> Subject: ocfs2: fix deadlock between o2hb thread and o2net_wq The following case may lead to o2net_wq and o2hb thread deadlock on o2hb_callback_sem. Currently there are 2 nodes say N1, N2 in the cluster. And N2 down, at the same time, N3 tries to join the cluster. So N1 will handle node down (N2) and join (N3) simultaneously. o2hb o2net_wq ->o2hb_do_disk_heartbeat ->o2hb_check_slot ->o2hb_run_event_list ->o2hb_fire_callbacks ->down_write(&o2hb_callback_sem) ->o2net_hb_node_down_cb ->flush_workqueue(o2net_wq) ->o2net_process_message ->dlm_query_join_handler ->o2hb_check_node_heartbeating ->o2hb_fill_node_map ->down_read(&o2hb_callback_sem) No need to take o2hb_callback_sem in dlm_query_join_handler, o2hb_live_lock is enough to protect live node map. Signed-off-by: Joseph Qi <joseph.qi@xxxxxxxxxx> Cc: xMark Fasheh <mfasheh@xxxxxxxx> Cc: Joel Becker <jlbec@xxxxxxxxxxxx> Cc: jiangyiwen <jiangyiwen@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/ocfs2/cluster/heartbeat.c | 19 +++++++++++++++++++ fs/ocfs2/cluster/heartbeat.h | 1 + fs/ocfs2/dlm/dlmdomain.c | 2 +- 3 files changed, 21 insertions(+), 1 deletion(-) diff -puN fs/ocfs2/cluster/heartbeat.c~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq fs/ocfs2/cluster/heartbeat.c --- a/fs/ocfs2/cluster/heartbeat.c~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq +++ a/fs/ocfs2/cluster/heartbeat.c @@ -2572,6 +2572,25 @@ int o2hb_check_node_heartbeating(u8 node } EXPORT_SYMBOL_GPL(o2hb_check_node_heartbeating); +int o2hb_check_node_heartbeating_no_sem(u8 node_num) +{ + unsigned long testing_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; + unsigned long flags; + + spin_lock_irqsave(&o2hb_live_lock, flags); + o2hb_fill_node_map_from_callback(testing_map, sizeof(testing_map)); + spin_unlock_irqrestore(&o2hb_live_lock, flags); + if (!test_bit(node_num, testing_map)) { + mlog(ML_HEARTBEAT, + "node (%u) does not have heartbeating enabled.\n", + node_num); + return 0; + } + + return 1; +} +EXPORT_SYMBOL_GPL(o2hb_check_node_heartbeating_no_sem); + int o2hb_check_node_heartbeating_from_callback(u8 node_num) { unsigned long testing_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; diff -puN fs/ocfs2/cluster/heartbeat.h~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq fs/ocfs2/cluster/heartbeat.h --- a/fs/ocfs2/cluster/heartbeat.h~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq +++ a/fs/ocfs2/cluster/heartbeat.h @@ -80,6 +80,7 @@ void o2hb_fill_node_map(unsigned long *m void o2hb_exit(void); int o2hb_init(void); int o2hb_check_node_heartbeating(u8 node_num); +int o2hb_check_node_heartbeating_no_sem(u8 node_num); int o2hb_check_node_heartbeating_from_callback(u8 node_num); int o2hb_check_local_node_heartbeating(void); void o2hb_stop_all_regions(void); diff -puN fs/ocfs2/dlm/dlmdomain.c~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq fs/ocfs2/dlm/dlmdomain.c --- a/fs/ocfs2/dlm/dlmdomain.c~ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq +++ a/fs/ocfs2/dlm/dlmdomain.c @@ -839,7 +839,7 @@ static int dlm_query_join_handler(struct * to back off and try again. This gives heartbeat a chance * to catch up. */ - if (!o2hb_check_node_heartbeating(query->node_idx)) { + if (!o2hb_check_node_heartbeating_no_sem(query->node_idx)) { mlog(0, "node %u is not in our live map yet\n", query->node_idx); _ Patches currently in -mm which might be from joseph.qi@xxxxxxxxxx are ocfs2-o2net-dont-shutdown-connection-when-idle-timeout.patch ocfs2-o2net-set-tcp-user-timeout-to-max-value.patch ocfs2-quorum-add-a-log-for-node-not-fenced.patch ocfs2-dlm-fix-race-between-dispatched_work-and-dlm_lockres_grab_inflight_worker.patch ocfs2-fix-deadlock-between-o2hb-thread-and-o2net_wq.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html