+ ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled
has been added to the -mm tree.  Its filename is
     ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: wangjian <wangjian161@xxxxxxxxxx>
Subject: ocfs2/dlm: return DLM_CANCELGRANT if the lock is on granted list and the operation is canceled

In dlm_move_lockres_to_recovery_list(), if the lock is in the granted
queue and cancel_pending is set, it will encounter a BUG.  I think this is
a meaningless BUG, so be prepared to remove it.  A scenario that causes
this BUG will be given below.

At the beginning, Node 1 is the master and has NL lock, Node 2 has PR
lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
             want to get EX lock.

                             want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                             receive BAST from
                             Node 1. downconvert
                             thread begin to
                             cancel PR to EX conversion.
                             In dlmunlock_common function,
                             downconvert thread has set
                             lock->cancel_pending,
                             but did not enter
                             dlm_send_remote_unlock_request
                             function.

             Node2 dies because
             the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                             receive AST from Node 1.
                             change lock level to EX,
                             move lock to granted list.

Node1 dies because
the host is powered down.

                             In dlm_move_lockres_to_recovery_list
                             function. the lock is in the
                             granted queue and cancel_pending
                             is set. BUG_ON.

But after clearing this BUG, process will encounter
the second BUG in the ocfs2_unlock_ast function.
Here is a scenario that will cause the second BUG
in ocfs2_unlock_ast as follows:

At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
             want to get EX lock.

                             want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                             receive BAST from
                             Node 1. downconvert
                             thread begin to
                             cancel PR to EX conversion.
                             In dlmunlock_common function,
                             downconvert thread has released
                             lock->spinlock and res->spinlock,
                             but did not enter
                             dlm_send_remote_unlock_request
                             function.

             Node2 dies because
             the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                             receive AST from Node 1.
                             change lock level to EX,
                             move lock to granted list,
                             set lockres->l_unlock_action
                             as OCFS2_UNLOCK_INVALID
                             in ocfs2_locking_ast function.

Node2 dies because
the host is powered down.

                             Node 3 realize that Node 1
                             is dead, remove Node 1 from
                             domain_map. downconvert thread
                             get DLM_NORMAL from
                             dlm_send_remote_unlock_request
                             function and set *call_ast as 1.
                             Then downconvert thread meet
                             BUG in ocfs2_unlock_ast function.

To avoid meet the second BUG, dlmunlock_common() should return
DLM_CANCELGRANT if the lock is on granted list and the operation is
canceled.

Link: http://lkml.kernel.org/r/98f0e80c-9c13-dbb6-047c-b40e100082b1@xxxxxxxxxx
Signed-off-by: Jian Wang <wangjian161@xxxxxxxxxx>
Reviewed-by: Yiwen Jiang <jiangyiwen@xxxxxxxxxx>
Cc: Mark Fasheh <mark@xxxxxxxxxx>
Cc: Joel Becker <jlbec@xxxxxxxxxxxx>
Cc: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
Cc: Joseph Qi <jiangqi903@xxxxxxxxx>
Cc: Changwei Ge <ge.changwei@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/ocfs2/dlm/dlmrecovery.c |    1 -
 fs/ocfs2/dlm/dlmunlock.c   |    5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled
+++ a/fs/ocfs2/dlm/dlmrecovery.c
@@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(s
 				 * if this had completed successfully
 				 * before sending this lock state to the
 				 * new master */
-				BUG_ON(i != DLM_CONVERTING_LIST);
 				mlog(0, "node died with cancel pending "
 				     "on %.*s. move back to granted list.\n",
 				     res->lockname.len, res->lockname.name);
--- a/fs/ocfs2/dlm/dlmunlock.c~ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled
+++ a/fs/ocfs2/dlm/dlmunlock.c
@@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(
 							flags, owner);
 		spin_lock(&res->spinlock);
 		spin_lock(&lock->spinlock);
+
+		if ((flags & LKM_CANCEL) &&
+				dlm_lock_on_list(&res->granted, lock))
+			status = DLM_CANCELGRANT;
+
 		/* if the master told us the lock was already granted,
 		 * let the ast handle all of these actions */
 		if (status == DLM_CANCELGRANT) {
_

Patches currently in -mm which might be from wangjian161@xxxxxxxxxx are

ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch
ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux