I note that this part of afr_read_txn() gets triggered a lot. if (afr_is_inode_refresh_reqd(inode, this, local->event_generation, event_generation)) { Maybe that's normal when one of the three servers are down (but why isn't it using its local copy by default?) The comment in that if block is: /* servers have disconnected / reconnected, and possibly rebooted, very likely changing the state of freshness of copies */ But we have one server conssitently down, not a changing situation. digging digging digging seemed to show this related to cache invalidation.... Because the paths seemed to suggest the inode needed refreshing and that seems handled by a case statement named GF_UPCALL_CACHE_INVALIDATION However, that must have been a wrong turn since turning off cache invalidation didn't help. I'm struggling to wrap my head around the code base and without the background in these concepts it's a tough hill to climb. I am going to have to try this again some day with fresh eyes and go to bed; the machine I have easy access to is going away in the morning. Now I'll have to reserve time on a contended one but I will do that and continue digging. Any suggestions would be greatly appreciated as I think I'm starting to tip over here on this one. On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote: > > Sadly I am not a developer, so I can't answer your questions. > > I'm not a FS o rnetwork developer either. I think there is a joke about > playing one on TV but maybe it's netflix now. > > Enabling certain debug options made too much information for me to watch > personally (but an expert could probably get through it). > > So I started putting targeted 'print' (gf_msg) statements in the code to > see how it got its way to split-brain. Maybe this will ring a bell > for someone. > > I can tell the only way we enter the split-brain path is through in the > first if statement of afr_read_txn_refresh_done(). > > This means afr_read_txn_refresh_done() itself was passed "err" and > that it appears thin_arbiter_count was not set (which makes sense, > I'm using 1x3, not a thin arbiter). > > So we jump to the readfn label, and read_subvol() should still be -1. > If I read right, it must mean that this if didn't return true because > my print statement didn't appear: > if ((ret == 0) && spb_choice >= 0) { > > So we're still with the original read_subvol == 1, > Which gets us to the split_brain message. > > So now I will try to learn why afr_read_txn_refresh_done() would have > 'err' set in the first place. I will also learn about > afr_inode_split_brain_choice_get(). Those seem to be the two methods to > have avoided falling in to the split brain hole here. > > > I put debug statements in these locations. I will mark with !!!!!! what > I see: > > > > diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c > --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600 > +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c 2020-03-30 15:45:02.917104321 -0500 > @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t * > priv = this->private; > > if (err) { > - if (!priv->thin_arbiter_count) > + if (!priv->thin_arbiter_count) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn"); > !!!!!!!!!!!!!!!!!!!!!! > We hit this error condition and jump to readfn below > !!!!!!!!!!!!!!!!!!!!!!! > goto readfn; > - if (err != EINVAL) > + } > + if (err != EINVAL) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn"); > goto readfn; > + } > /* We need to query the good bricks and/or thin-arbiter.*/ > afr_ta_read_txn_synctask(frame, this); > return 0; > @@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t * > read_subvol = afr_read_subvol_select_by_policy(inode, this, local->readable, > NULL); > if (read_subvol == -1) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops read_subvol returned -1, going to readfn"); > + > err = EIO; > goto readfn; > } > @@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t * > readfn: > if (read_subvol == -1) { > ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice); > - if ((ret == 0) && spb_choice >= 0) > + if ((ret == 0) && spb_choice >= 0) { > !!!!!!!!!!!!!!!!!!!!!! > We never get here, afr_inode_split_brain_choice_get() must not have > returned what was needed to enter. > !!!!!!!!!!!!!!!!!!!!!! > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol was -1 to begin with split brain choice found: %d", spb_choice); > read_subvol = spb_choice; > + } > } > > if (read_subvol == -1) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows up above split-brain error"); > !!!!!!!!!!!!!!!!!!!!!! > We hit here. Game over player. > !!!!!!!!!!!!!!!!!!!!!! > + > AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err); > } > afr_read_txn_wind(frame, this, read_subvol); ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users