Re: gnfs split brain when 1 server in 3x1 down (high load) - help request

Ravishankar N <ravishankar@xxxxxxxxxx> · Thu, 16 Apr 2020 12:33:39 +0530

The patch by itself is only making changes specific to AFR, so it should 
not affect other translators. But I wonder how readdir-ahead is enabled 
in your gnfs stack. All performance xlators are turned off in gnfs 
except write-behind and AFAIK, there is no way to enable them via the 
CLI. Did you custom edit your gnfs volfile to add readdir-ahead? If yes, 
does the crash go-away if you remove the xlator from the nfs volfile?

Regards,
Ravi
On 16/04/20 8:47 am, Erik Jacobson wrote:
It is important to note that our testing has shown zero split-brain
errors since the patch... And that it is significantly harder to
hit the seg fault than it was to hit split-brain before. It's still
sufficiently frequent that we can't let it out the door.  In my intensive
test case (found elsewhere in the thread), it would 100% hit the problem
with 57 nodes every time at least once. With the patch, zero split
brain, but maybe 1 in 4 runs would seg fault. We didn't have a seg
fault problem previously. This is all within the context of 1 of the 3
servers in the subvolume being down. I hit the seg fault once with just
57 nodes booting (using NFS for their root FS) and no other load.


Scott was able to take an analysis pass. Any suggestions? his words
follow:


The segfault appears to occur in read-ahead functionality.  We will keep
the core in case it needs to be looked at again, being sure to copy off
all necessary metadata to maintain adequate symbol lookup within gdb.
It may also be possible to breakpoint immediately prior to the segfault,
but setting the right conditions may prove to be difficult.

A bit of analysis:

Prior to the segfault, the op_errno field in a struct rda_fd_ctx packet
shows an ENOENT error.  The packet is from the call_frame_t parameter of
rda_fill_fd_cbk() (Backtrace #2)  The following shows the progression
from the call_frame_t parameter to the op_errno field of the rda_fd_ctx
structure.

(gdb) print {call_frame_t}0x7fe5acf18eb8
$26 = {root = 0x7fe5ac6d65f8, parent = 0x0, frames = {next =
0x7fe5ac6d6cf0, prev = 0x7fe5ac096298}, local = 0x7fe5ac1dbc78,
    this = 0x7fe63c0162b0, ret = 0x0, ref_count = 0, lock = {spinlock =
0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
          __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list =
{__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
        __align = 0}}, cookie = 0x0, complete = false, op = GF_FOP_NULL,
begin = {tv_sec = 4234, tv_nsec = 637078332}, end = {tv_sec = 4234,
      tv_nsec = 803882781}, wind_from = 0x0, wind_to = 0x0, unwind_from =
0x0, unwind_to = 0x0}

(gdb) print {struct rda_local}0x7fe5ac1dbc78
$27 = {ctx = 0x7fe5ace46590, fd = 0x7fe60433d8b8, xattrs = 0x0, inode =
0x0, offset = 0, generation = 0, skip_dir = 0}

(gdb) print {struct rda_fd_ctx}0x7fe5ace46590
$28 = {cur_offset = 0, cur_size = 638, next_offset = 1538, state = 36,
lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0,
          __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision =
0, __list = {__prev = 0x0, __next = 0x0}},
        __size = '\000' <repeats 39 times>, __align = 0}}, entries =
{{list = {next = 0x7fe60cda5f90, prev = 0x7fe60ca08190}, {
          next = 0x7fe60cda5f90, prev = 0x7fe60ca08190}}, d_ino = 0,
d_off = 0, d_len = 0, d_type = 0, d_stat = {ia_flags = 0, ia_ino = 0,
        ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0,
ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0,
        ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0,
ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0,
        ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000'
<repeats 15 times>, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000',
          sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000',
write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000',
            write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000',
write = 0 '\000', exec = 0 '\000'}}}, dict = 0x0, inode = 0x0,
      d_name = 0x7fe5ace466a8 ""}, fill_frame = 0x0, stub = 0x0, op_errno
= 2, xattrs = 0x0, writes_during_prefetch = 0x0, prefetching = {
      lk = 0x7fe5ace466d0 "", value = 0}}

The segfault occurs at the bottom of rda_fill_fd_cbk() where the rpc
call stack frames are being destroyed.  The following are what I believe
to be the three frames that are intended to be destroyed, but it is
unclear which packet is causing the problem.  If I were to dig more into
this, I will use ddd (graphical debugger).  It's been a while since I've
done low level debugging like this, so I'm a bit rusty.

(gdb) print {call_frame_t}0x7fe5acf18eb8
$34 = {root = 0x7fe5ac6d65f8, parent = 0x0, frames = {next =
0x7fe5ac6d6cf0, prev = 0x7fe5ac096298}, local = 0x7fe5ac1dbc78,
    this = 0x7fe63c0162b0, ret = 0x0, ref_count = 0, lock = {spinlock =
0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
          __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list =
{__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>,
        __align = 0}}, cookie = 0x0, complete = false, op = GF_FOP_NULL,
begin = {tv_sec = 4234, tv_nsec = 637078332}, end = {tv_sec = 4234,
      tv_nsec = 803882781}, wind_from = 0x0, wind_to = 0x0, unwind_from =
0x0, unwind_to = 0x0}
(gdb) print {call_frame_t}0x7fe5ac6d6ce0
$35 = {root = 0x0, parent = 0x563f5a955920, frames = {next =
0x7fe5ac096298, prev = 0x7fe5acf18ec8}, local = 0x0, this = 0x108a,
    ret = 0x25f90b3c, ref_count = 0, lock = {spinlock = 0, mutex =
{__data = {__lock = 0, __count = 0, __owner = 1586972324, __nusers = 0,
          __kind = 210092664, __spins = 0, __elision = 0, __list =
{__prev = 0x0, __next = 0x0}},
        __size =
"\000\000\000\000\000\000\000\000\244F\227^\000\000\000\000x\302\205\f",
'\000' <repeats 19 times>, __align = 0}},
    cookie = 0x0, complete = false, op = GF_FOP_NULL, begin = {tv_sec =
0, tv_nsec = 0}, end = {tv_sec = 0, tv_nsec = 0}, wind_from = 0x0,
    wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}
(gdb) print {call_frame_t}0x7fe5ac096288
$36 = {root = 0x7fe5ac378860, parent = 0x7fe5acf18eb8, frames = {next =
0x7fe5acf18ec8, prev = 0x7fe5ac6d6cf0}, local = 0x0,
    this = 0x7fe63c014000, ret = 0x7fe63bb5d350 <rda_fill_fd_cbk>,
ref_count = 0, lock = {spinlock = 0, mutex = {__data = {__lock = 0,
          __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins =
0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
        __size = '\000' <repeats 39 times>, __align = 0}}, cookie =
0x7fe5ac096288, complete = true, op = GF_FOP_READDIRP, begin = {
      tv_sec = 4234, tv_nsec = 637078816}, end = {tv_sec = 4234, tv_nsec
= 803882755},
    wind_from = 0x7fe63bb5e8c0 <__FUNCTION__.22226> "rda_fill_fd",
wind_to = 0x7fe63bb5e3f0 "(this->children->xlator)->fops->readdirp",
    unwind_from = 0x7fe63bdd8a80 <__FUNCTION__.20442> "afr_readdir_cbk",
unwind_to = 0x7fe63bb5dfbb "rda_fill_fd_cbk"}



On 4/15/20 8:14 AM, Erik Jacobson wrote:
Scott - I was going to start with gluster74 since that is what he
started at but it applies well to glsuter72 so I'll start tthere.

Getting ready to go. The patch detail is interesting. This is probably
why it took hiim a bit longer. It wasn't a trivial patch.


On Wed, Apr 15, 2020 at 12:45:57PM -0500, Erik Jacobson wrote:
The new split-brain issue is much harder to reproduce, but after several
(correcting to say new seg fault issue, the split brain is gone!!)

intense runs, it usually hits once.

We switched to pure gluster74 plus your patch so we're apples to apples
now.

I'm going to see if Scott can help debug it.

Here is the back trace info from the core dump:

-rw-r-----  1 root root 1.9G Apr 15 12:40 core.glusterfs.0.52467a7e67964553aa9971eb2bb0148c.61084.1586972324000000
-rw-r-----  1 root root 221M Apr 15 12:40 core.glusterfs.0.52467a7e67964553aa9971eb2bb0148c.61084.1586972324000000.lz4
drwxrwxrwt  9 root root  20K Apr 15 12:40 .
[root@leader3 tmp]#
[root@leader3 tmp]#
[root@leader3 tmp]# gdb core.glusterfs.0.52467a7e67964553aa9971eb2bb0148c.61084.1586972324000000
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-5.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
[New LWP 61102]
[New LWP 61085]
[New LWP 61087]
[New LWP 61117]
[New LWP 61086]
[New LWP 61108]
[New LWP 61089]
[New LWP 61090]
[New LWP 61121]
[New LWP 61088]
[New LWP 61091]
[New LWP 61093]
[New LWP 61095]
[New LWP 61092]
[New LWP 61094]
[New LWP 61098]
[New LWP 61096]
[New LWP 61097]
[New LWP 61084]
[New LWP 61100]
[New LWP 61103]
[New LWP 61104]
[New LWP 61099]
[New LWP 61105]
[New LWP 61101]
[New LWP 61106]
[New LWP 61109]
[New LWP 61107]
[New LWP 61112]
[New LWP 61119]
[New LWP 61110]
[New LWP 61111]
[New LWP 61118]
[New LWP 61123]
[New LWP 61122]
[New LWP 61113]
[New LWP 61114]
[New LWP 61120]
[New LWP 61116]
[New LWP 61115]

warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-7.4-1.el8722.0800.200415T1052.a.rhel8hpeerikj.x86_64.debug...done.
done.

warning: Ignoring non-absolute filename: <linux-vdso.so.1>
Missing separate debuginfo for linux-vdso.so.1
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/06/44254f9cbaa826db070a796046026adba58266

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/n'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe63bb5d7bb in FRAME_DESTROY (frame=0x7fe5ac096288)
     at ../../../../libglusterfs/src/glusterfs/stack.h:193
193	        FRAME_DESTROY(frame);
[Current thread is 1 (Thread 0x7fe617fff700 (LWP 61102))]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.28-42.el8.x86_64 keyutils-libs-1.5.10-6.el8.x86_64 krb5-libs-1.16.1-22.el8.x86_64 libacl-2.2.53-1.el8.x86_64 libattr-2.4.48-3.el8.x86_64 libcom_err-1.44.3-2.el8.x86_64 libgcc-8.2.1-3.5.el8.x86_64 libselinux-2.8-6.el8.x86_64 libtirpc-1.1.4-3.el8.x86_64 libuuid-2.32.1-8.el8.x86_64 openssl-libs-1.1.1-8.el8.x86_64 pcre2-10.32-1.el8.x86_64 zlib-1.2.11-10.el8.x86_64
(gdb) bt
#0  0x00007fe63bb5d7bb in FRAME_DESTROY (frame=0x7fe5ac096288)
     at ../../../../libglusterfs/src/glusterfs/stack.h:193
#1  STACK_DESTROY (stack=0x7fe5ac6d65f8)
     at ../../../../libglusterfs/src/glusterfs/stack.h:193
#2  rda_fill_fd_cbk (frame=0x7fe5acf18eb8, cookie=<optimized out>,
     this=0x7fe63c0162b0, op_ret=3, op_errno=0, entries=<optimized out>,
     xdata=0x0) at readdir-ahead.c:623
#3  0x00007fe63bd6c3aa in afr_readdir_cbk (frame=<optimized out>,
     cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
     op_errno=<optimized out>, subvol_entries=<optimized out>, xdata=0x0)
     at afr-dir-read.c:234
#4  0x00007fe6400a1e07 in client4_0_readdirp_cbk (req=<optimized out>,
     iov=<optimized out>, count=<optimized out>, myframe=0x7fe5ace0eda8)
     at client-rpc-fops_v2.c:2338
#5  0x00007fe6479ca115 in rpc_clnt_handle_reply (
     clnt=clnt@entry=0x7fe63c0663f0, pollin=pollin@entry=0x7fe60c1737a0)
     at rpc-clnt.c:764
#6  0x00007fe6479ca4b3 in rpc_clnt_notify (trans=0x7fe63c066780,
     mydata=0x7fe63c066420, event=<optimized out>, data=0x7fe60c1737a0)
     at rpc-clnt.c:931
#7  0x00007fe6479c707b in rpc_transport_notify (
     this=this@entry=0x7fe63c066780,
     event=event@entry=RPC_TRANSPORT_MSG_RECEIVED,
     data=data@entry=0x7fe60c1737a0) at rpc-transport.c:545
#8  0x00007fe640da893c in socket_event_poll_in_async (xl=<optimized out>,
     async=0x7fe60c1738c8) at socket.c:2601
#9  0x00007fe640db03dc in gf_async (
     cbk=0x7fe640da8910 <socket_event_poll_in_async>, xl=<optimized out>,
     async=0x7fe60c1738c8) at ../../../../libglusterfs/src/glusterfs/async.h:189
#10 socket_event_poll_in (notify_handled=true, this=0x7fe63c066780)
     at socket.c:2642
#11 socket_event_handler (fd=fd@entry=19, idx=idx@entry=10, gen=gen@entry=1,
     data=data@entry=0x7fe63c066780, poll_in=<optimized out>,
     poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
     at socket.c:3040
#12 0x00007fe647c84a5b in event_dispatch_epoll_handler (event=0x7fe617ffe014,
     event_pool=0x563f5a98c750) at event-epoll.c:650
#13 event_dispatch_epoll_worker (data=0x7fe63c063b60) at event-epoll.c:763
#14 0x00007fe6467a72de in start_thread () from /lib64/libpthread.so.0
#15 0x00007fe645fffa63 in clone () from /lib64/libc.so.6



On Wed, Apr 15, 2020 at 10:39:34AM -0500, Erik Jacobson wrote:
After several successful runs of the test case, we thought we were
solved. Indeed, split-brain is gone.

But we're triggering a seg fault now, even in a less loaded case.

We're going to switch to gluster74, which was your intention, and report
back.

On Wed, Apr 15, 2020 at 10:33:01AM -0500, Erik Jacobson wrote:
Attached the wrong patch by mistake in my previous mail. Sending the correct
one now.
Early results loook GREAT !!

We'll keep beating on it. We applied it to glsuter72 as that is what we
have to ship with. It applied fine with some line moves.

If you would like us to also run a test with gluster74 so that you can
say that's tested, we can run that test. I can do a special build.

THANK YOU!!


-Ravi


On 15/04/20 2:05 pm, Ravishankar N wrote:


     On 10/04/20 2:06 am, Erik Jacobson wrote:

         Once again thanks for sticking with us. Here is a reply from Scott
         Titus. If you have something for us to try, we'd love it. The code had
         your patch applied when gdb was run:


         Here is the addr2line output for those addresses.  Very interesting
         command, of
         which I was not aware.

         [root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/
         cluster/
         afr.so 0x6f735
         afr_lookup_metadata_heal_check
         afr-common.c:2803
         [root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/
         cluster/
         afr.so 0x6f0b9
         afr_lookup_done
         afr-common.c:2455
         [root@leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/
         cluster/
         afr.so 0x5c701
         afr_inode_event_gen_reset
         afr-common.c:755


     Right, so afr_lookup_done() is resetting the event gen to zero. This looks
     like a race between lookup and inode refresh code paths. We made some
     changes to the event generation logic in AFR. Can you apply the attached
     patch and see if it fixes the split-brain issue? It should apply cleanly on
     glusterfs-7.4.

     Thanks,
     Ravi

    
     ________



     Community Meeting Calendar:

     Schedule -
     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     Bridge: https://bluejeans.com/441850968

     Gluster-users mailing list
     Gluster-users@xxxxxxxxxxx
     https://lists.gluster.org/mailman/listinfo/gluster-users

>From 11601e709a97ce7c40078866bf5d24b486f39454 Mon Sep 17 00:00:00 2001
From: Ravishankar N <ravishankar@xxxxxxxxxx>
Date: Wed, 15 Apr 2020 13:53:26 +0530
Subject: [PATCH] afr: event gen changes

The general idea of the changes is to prevent resetting event generation
to zero in the inode ctx, since event gen is something that should
follow 'causal order'.

Change #1:
For a read txn, in inode refresh cbk, if event_generation is
found zero, we are failing the read fop. This is not needed
because change in event gen is only a marker for the next inode refresh to
happen and should not be taken into account by the current read txn.

Change #2:
The event gen being zero above can happen if there is a racing lookup,
which resets even get (in afr_lookup_done) if there are non zero afr
xattrs. The resetting is done only to trigger an inode refresh and a
possible client side heal on the next lookup. That can be acheived by
setting the need_refresh flag in the inode ctx. So replaced all
occurences of resetting even gen to zero with a call to
afr_inode_need_refresh_set().

Change #3:
In both lookup and discover path, we are doing an inode refresh which is
not required since all 3 essentially do the same thing- update the inode
ctx with the good/bad copies from the brick replies. Inode refresh also
triggers background heals, but I think it is okay to do it when we call
refresh during the read and write txns and not in the lookup path.

Change-Id: Id0600dd34b144b4ae7a3bf3c397551adf7e402f1
Signed-off-by: Ravishankar N <ravishankar@xxxxxxxxxx>
---
  ...ismatch-resolution-with-fav-child-policy.t |  8 +-
  xlators/cluster/afr/src/afr-common.c          | 92 ++++---------------
  xlators/cluster/afr/src/afr-dir-write.c       |  6 +-
  xlators/cluster/afr/src/afr.h                 |  5 +-
  4 files changed, 29 insertions(+), 82 deletions(-)

diff --git a/tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t b/tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
index f4aa351e4..12af0c854 100644
--- a/tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
+++ b/tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
@@ -168,8 +168,8 @@ TEST [ "$gfid_1" != "$gfid_2" ]
  #We know that second brick has the bigger size file
  BIGGER_FILE_MD5=$(md5sum $B0/${V0}1/f3 | cut -d\  -f1)
  
-TEST ls $M0/f3
-TEST cat $M0/f3
+TEST ls $M0 #Trigger entry heal via readdir inode refresh
+TEST cat $M0/f3 #Trigger data heal via readv inode refresh
  EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0
  
  #gfid split-brain should be resolved
@@ -215,8 +215,8 @@ TEST $CLI volume start $V0 force
  EXPECT_WITHIN $PROCESS_UP_TIMEOUT "1" brick_up_status $V0 $H0 $B0/${V0}2
  EXPECT_WITHIN $CHILD_UP_TIMEOUT "1" afr_child_up_status $V0 2
  
-TEST ls $M0/f4
-TEST cat $M0/f4
+TEST ls $M0 #Trigger entry heal via readdir inode refresh
+TEST cat $M0/f4  #Trigger data heal via readv inode refresh
  EXPECT_WITHIN $HEAL_TIMEOUT "^0$" get_pending_heal_count $V0
  
  #gfid split-brain should be resolved
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index 61f21795e..319665a14 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -282,7 +282,7 @@ __afr_set_in_flight_sb_status(xlator_t *this, afr_local_t *local,
                  metadatamap |= (1 << index);
              }
              if (metadatamap_old != metadatamap) {
-                event = 0;
+                __afr_inode_need_refresh_set(inode, this);
              }
              break;
  
@@ -295,7 +295,7 @@ __afr_set_in_flight_sb_status(xlator_t *this, afr_local_t *local,
                  datamap |= (1 << index);
              }
              if (datamap_old != datamap)
-                event = 0;
+                __afr_inode_need_refresh_set(inode, this);
              break;
  
          default:
@@ -458,34 +458,6 @@ out:
      return ret;
  }
  
-int
-__afr_inode_event_gen_reset_small(inode_t *inode, xlator_t *this)
-{
-    int ret = -1;
-    uint16_t datamap = 0;
-    uint16_t metadatamap = 0;
-    uint32_t event = 0;
-    uint64_t val = 0;
-    afr_inode_ctx_t *ctx = NULL;
-
-    ret = __afr_inode_ctx_get(this, inode, &ctx);
-    if (ret)
-        return ret;
-
-    val = ctx->read_subvol;
-
-    metadatamap = (val & 0x000000000000ffff) >> 0;
-    datamap = (val & 0x00000000ffff0000) >> 16;
-    event = 0;
-
-    val = ((uint64_t)metadatamap) | (((uint64_t)datamap) << 16) |
-          (((uint64_t)event) << 32);
-
-    ctx->read_subvol = val;
-
-    return ret;
-}
-
  int
  __afr_inode_read_subvol_get(inode_t *inode, xlator_t *this, unsigned char *data,
                              unsigned char *metadata, int *event_p)
@@ -556,22 +528,6 @@ out:
      return ret;
  }
  
-int
-__afr_inode_event_gen_reset(inode_t *inode, xlator_t *this)
-{
-    afr_private_t *priv = NULL;
-    int ret = -1;
-
-    priv = this->private;
-
-    if (priv->child_count <= 16)
-        ret = __afr_inode_event_gen_reset_small(inode, this);
-    else
-        ret = -1;
-
-    return ret;
-}
-
  int
  afr_inode_read_subvol_get(inode_t *inode, xlator_t *this, unsigned char *data,
                            unsigned char *metadata, int *event_p)
@@ -721,30 +677,22 @@ out:
      return need_refresh;
  }
  
-static int
-afr_inode_need_refresh_set(inode_t *inode, xlator_t *this)
+int
+__afr_inode_need_refresh_set(inode_t *inode, xlator_t *this)
  {
      int ret = -1;
      afr_inode_ctx_t *ctx = NULL;
  
-    GF_VALIDATE_OR_GOTO(this->name, inode, out);
-
-    LOCK(&inode->lock);
-    {
-        ret = __afr_inode_ctx_get(this, inode, &ctx);
-        if (ret)
-            goto unlock;
-
+    ret = __afr_inode_ctx_get(this, inode, &ctx);
+    if (ret == 0) {
          ctx->need_refresh = _gf_true;
      }
-unlock:
-    UNLOCK(&inode->lock);
-out:
+
      return ret;
  }
  
  int
-afr_inode_event_gen_reset(inode_t *inode, xlator_t *this)
+afr_inode_need_refresh_set(inode_t *inode, xlator_t *this)
  {
      int ret = -1;
  
@@ -754,7 +702,7 @@ afr_inode_event_gen_reset(inode_t *inode, xlator_t *this)
                       "Resetting event gen for %s", uuid_utoa(inode->gfid));
      LOCK(&inode->lock);
      {
-        ret = __afr_inode_event_gen_reset(inode, this);
+        ret = __afr_inode_need_refresh_set(inode, this);
      }
      UNLOCK(&inode->lock);
  out:
@@ -1187,7 +1135,7 @@ afr_txn_refresh_done(call_frame_t *frame, xlator_t *this, int err)
      ret = afr_inode_get_readable(frame, inode, this, local->readable,
                                   &event_generation, local->transaction.type);
  
-    if (ret == -EIO || (local->is_read_txn && !event_generation)) {
+    if (ret == -EIO) {
          /* No readable subvolume even after refresh ==> splitbrain.*/
          if (!priv->fav_child_policy) {
              err = EIO;
@@ -2451,7 +2399,7 @@ afr_lookup_done(call_frame_t *frame, xlator_t *this)
          if (read_subvol == -1)
              goto cant_interpret;
          if (ret) {
-            afr_inode_event_gen_reset(local->inode, this);
+            afr_inode_need_refresh_set(local->inode, this);
              dict_del_sizen(local->replies[read_subvol].xdata, GF_CONTENT_KEY);
          }
      } else {
@@ -3007,6 +2955,7 @@ afr_discover_unwind(call_frame_t *frame, xlator_t *this)
      afr_private_t *priv = NULL;
      afr_local_t *local = NULL;
      int read_subvol = -1;
+    int ret = 0;
      unsigned char *data_readable = NULL;
      unsigned char *success_replies = NULL;
  
@@ -3028,7 +2977,10 @@ afr_discover_unwind(call_frame_t *frame, xlator_t *this)
      if (!afr_has_quorum(success_replies, this, frame))
          goto unwind;
  
-    afr_replies_interpret(frame, this, local->inode, NULL);
+    ret = afr_replies_interpret(frame, this, local->inode, NULL);
+    if (ret) {
+        afr_inode_need_refresh_set(local->inode, this);
+    }
  
      read_subvol = afr_read_subvol_decide(local->inode, this, NULL,
                                           data_readable);
@@ -3284,11 +3236,7 @@ afr_discover(call_frame_t *frame, xlator_t *this, loc_t *loc, dict_t *xattr_req)
      afr_read_subvol_get(loc->inode, this, NULL, NULL, &event,
                          AFR_DATA_TRANSACTION, NULL);
  
-    if (afr_is_inode_refresh_reqd(loc->inode, this, event,
-                                  local->event_generation))
-        afr_inode_refresh(frame, this, loc->inode, NULL, afr_discover_do);
-    else
-        afr_discover_do(frame, this, 0);
+    afr_discover_do(frame, this, 0);
  
      return 0;
  out:
@@ -3429,11 +3377,7 @@ afr_lookup(call_frame_t *frame, xlator_t *this, loc_t *loc, dict_t *xattr_req)
      afr_read_subvol_get(loc->parent, this, NULL, NULL, &event,
                          AFR_DATA_TRANSACTION, NULL);
  
-    if (afr_is_inode_refresh_reqd(loc->inode, this, event,
-                                  local->event_generation))
-        afr_inode_refresh(frame, this, loc->parent, NULL, afr_lookup_do);
-    else
-        afr_lookup_do(frame, this, 0);
+    afr_lookup_do(frame, this, 0);
  
      return 0;
  out:
diff --git a/xlators/cluster/afr/src/afr-dir-write.c b/xlators/cluster/afr/src/afr-dir-write.c
index 82a72fddd..333085b14 100644
--- a/xlators/cluster/afr/src/afr-dir-write.c
+++ b/xlators/cluster/afr/src/afr-dir-write.c
@@ -119,11 +119,11 @@ __afr_dir_write_finalize(call_frame_t *frame, xlator_t *this)
              continue;
          if (local->replies[i].op_ret < 0) {
              if (local->inode)
-                afr_inode_event_gen_reset(local->inode, this);
+                afr_inode_need_refresh_set(local->inode, this);
              if (local->parent)
-                afr_inode_event_gen_reset(local->parent, this);
+                afr_inode_need_refresh_set(local->parent, this);
              if (local->parent2)
-                afr_inode_event_gen_reset(local->parent2, this);
+                afr_inode_need_refresh_set(local->parent2, this);
              continue;
          }
  
diff --git a/xlators/cluster/afr/src/afr.h b/xlators/cluster/afr/src/afr.h
index a3f2942b3..ed6d777c1 100644
--- a/xlators/cluster/afr/src/afr.h
+++ b/xlators/cluster/afr/src/afr.h
@@ -958,7 +958,10 @@ afr_inode_read_subvol_set(inode_t *inode, xlator_t *this,
                            int event_generation);
  
  int
-afr_inode_event_gen_reset(inode_t *inode, xlator_t *this);
+__afr_inode_need_refresh_set(inode_t *inode, xlator_t *this);
+
+int
+afr_inode_need_refresh_set(inode_t *inode, xlator_t *this);
  
  int
  afr_read_subvol_select_by_policy(inode_t *inode, xlator_t *this,
--
2.25.2



________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users