Re: gnfs split brain when 1 server in 3x1 down (high load) - help request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/04/20 9:12 pm, Erik Jacobson wrote:
This leaves us with afr_quorum_errno() returning the error.

afr_final_errno() iterates through the 'children', looking for
valid errors within the replies for the transaction (refresh transaction?).
The function returns the highest valued error, which must be EIO (value of 5)
in this case.

I have not looked into how or what would set the error value in the
replies array,

The errror numbers that you see in the replies array in afr_final_errno() are set in afr_inode_refresh_subvol_cbk().

During inode refresh (which is essentially a lookup), AFR sends the the lookup request on all its connected children and the replies from each one of them are captured in afr_inode_refresh_subvol_cbk(). So adding a log here can identify if we got EIO from any of its children. See attached patch for an example.

After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). But you already know this flow now.
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index 4bfaef9e8..096ce06f0 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this,
         if (xdata)
             local->replies[call_child].xdata = dict_ref(xdata);
     }
+    if (op_ret == -1)
+        gf_msg_callingfn(
+            this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,
+            "Inode refresh on child:%d failed with errno:%d for %s(%s) ",
+            call_child, op_errno, local->loc.name,
+            uuid_utoa(local->loc.inode->gfid));
     if (xdata) {
         ret = dict_get_int8(xdata, "link-count", &need_heal);
         local->replies[call_child].need_heal = need_heal;
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux