Re: [Ocfs2-devel] [PATCH] ocfs2: fix cluster hang after a node dies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Changwei,

Could you share the method to reproduce the problem?

On 2017/10/17 14:48, Changwei Ge wrote:
> When a node dies, other live nodes have to choose a new master
> for an existed lock resource mastered by the dead node.
> 
> As for ocfs2/dlm implementation, this is done by function -
> dlm_move_lockres_to_recovery_list which marks those lock rsources
> as DLM_LOCK_RES_RECOVERING and manages them via a list from which
> DLM changes lock resource's master later.
> 
> So without invoking dlm_move_lockres_to_recovery_list, no master will
> be choosed after dlm recovery accomplishment since no lock resource can
> be found through ::resource list.
> 
> What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for
> lock resources mastered a dead node, it will break up synchronization
> among nodes.
> 
> So invoke dlm_move_lockres_to_recovery_list again.
> 
> Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery
> lockres when recovery master goes down")'
> 
> Reported-by: Vitaly Mayatskih <v.mayatskih@xxxxxxxxx>
> Signed-off-by: Changwei Ge <ge.changwei@xxxxxxx>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c |    1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 74407c6..ec8f758 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct 
> dlm_ctxt *dlm, u8 dead_node)
>   					dlm_lockres_put(res);
>   					continue;
>   				}
> +				dlm_move_lockres_to_recovery_list(dlm, res);
>   			} else if (res->owner == dlm->node_num) {
>   				dlm_free_dead_locks(dlm, res, dead_node);
>   				__dlm_lockres_calc_usage(dlm, res);
> 



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux