Re: gnfs split brain when 1 server in 3x1 down (high load) - help request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/04/20 2:06 am, Erik Jacobson wrote:
Once again thanks for sticking with us. Here is a reply from Scott
Titus. If you have something for us to try, we'd love it. The code had
your patch applied when gdb was run:


Here is the addr2line output for those addresses.  Very interesting command, of
which I was not aware.

[root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x6f735
afr_lookup_metadata_heal_check
afr-common.c:2803
[root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x6f0b9
afr_lookup_done
afr-common.c:2455
[root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x5c701
afr_inode_event_gen_reset
afr-common.c:755

Right, so afr_lookup_done() is resetting the event gen to zero. This looks like a race between lookup and inode refresh code paths. We made some changes to the event generation logic in AFR. Can you apply the attached patch and see if it fixes the split-brain issue? It should apply cleanly on glusterfs-7.4.

Thanks,
Ravi
>From 4389908252c886c22897d8c52c0ce027a511453f Mon Sep 17 00:00:00 2001
From: Ravishankar N <ravishankar@xxxxxxxxxx>
Date: Mon, 24 Dec 2018 13:00:19 +0530
Subject: [PATCH] afr: mark pending xattrs as a part of metadata heal

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Fixes: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N <ravishankar@xxxxxxxxxx>
---
 .../cluster/afr/src/afr-self-heal-metadata.c  | 62 ++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/xlators/cluster/afr/src/afr-self-heal-metadata.c b/xlators/cluster/afr/src/afr-self-heal-metadata.c
index f4e31b65b..03f43bad1 100644
--- a/xlators/cluster/afr/src/afr-self-heal-metadata.c
+++ b/xlators/cluster/afr/src/afr-self-heal-metadata.c
@@ -190,6 +190,59 @@ out:
     return ret;
 }
 
+static int
+__afr_selfheal_metadata_mark_pending_xattrs(call_frame_t *frame, xlator_t *this,
+                                            inode_t *inode,
+                                            struct afr_reply *replies,
+                                            unsigned char *sources)
+{
+    int ret = 0;
+    int i = 0;
+    int m_idx = 0;
+    afr_private_t *priv = NULL;
+    int raw[AFR_NUM_CHANGE_LOGS] = {0};
+    dict_t *xattr = NULL;
+
+    priv = this->private;
+    m_idx = afr_index_for_transaction_type(AFR_METADATA_TRANSACTION);
+    raw[m_idx] = 1;
+
+    xattr = dict_new();
+    if (!xattr)
+        return -ENOMEM;
+
+    for (i = 0; i < priv->child_count; i++) {
+        if (sources[i])
+            continue;
+        ret = dict_set_static_bin(xattr, priv->pending_key[i], raw,
+                                  sizeof(int) * AFR_NUM_CHANGE_LOGS);
+        if (ret) {
+            ret = -1;
+            goto out;
+        }
+    }
+
+    for (i = 0; i < priv->child_count; i++) {
+        if (!sources[i])
+            continue;
+        ret = afr_selfheal_post_op(frame, this, inode, i, xattr, NULL);
+        if (ret < 0) {
+            gf_msg(this->name, GF_LOG_INFO, -ret, AFR_MSG_SELF_HEAL_INFO,
+                   "Failed to set pending metadata xattr on child %d for %s", i,
+                   uuid_utoa(inode->gfid));
+            goto out;
+        }
+    }
+
+    afr_replies_wipe(replies, priv->child_count);
+    ret = afr_selfheal_unlocked_discover(frame, inode, inode->gfid, replies);
+
+out:
+    if (xattr)
+        dict_unref(xattr);
+    return ret;
+}
+
 /*
  * Look for mismatching uid/gid or mode or user xattrs even if
  * AFR xattrs don't say so, and pick one arbitrarily as winner. */
@@ -210,6 +263,7 @@ __afr_selfheal_metadata_finalize_source(call_frame_t *frame, xlator_t *this,
     };
     int source = -1;
     int sources_count = 0;
+    int ret = 0;
 
     priv = this->private;
 
@@ -300,7 +354,13 @@ __afr_selfheal_metadata_finalize_source(call_frame_t *frame, xlator_t *this,
             healed_sinks[i] = 1;
         }
     }
-
+    if ((sources_count == priv->child_count) && (source > -1) &&
+        (AFR_COUNT(healed_sinks, priv->child_count) != 0)) {
+        ret = __afr_selfheal_metadata_mark_pending_xattrs(frame, this, inode,
+                                                          replies, sources);
+        if (ret < 0)
+            return ret;
+    }
 out:
     afr_mark_active_sinks(this, sources, locked_on, healed_sinks);
     return source;
-- 
2.25.1

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux