My co-worker prefers I keep driving the discussion since this isn't his normal area. But he's far better at digging in to these low level calls than I am. So I am passing along his analysis so far. We are wondering if we have enough information yet to turn on any light bulbs in terms of volume settings, system settings, or a code change... Or a suggested path for further debug. Recall from earlier in the thread, this a 3-way replicate single subvolume gluster fileystem that gets split-brain errors under heavy gnfs load when one of the three servers has gone down, representing a customer-reported problem. Scott's analysis is below. Next steps truly appreciated !! Apologies for misinterpreting the backtrace. #0 afr_read_txn_refresh_done (frame=0x7ffcf4146478, this=0x7fff64013720, err=5) at afr-read-txn.c:312 #1 0x00007fff68938d2b in afr_txn_refresh_done (frame=frame@entry=0x7ffcf4146478, this=this@entry=0x7fff64013720, err=5, err@entry=0) at afr-common.c:1222 #2 0x00007fff68939003 in afr_inode_refresh_done (frame=frame@entry=0x7ffcf4146478, this=this@entry=0x7fff64013720, error=0) at afr-common.c:1294 instead of the #1/#2 above calling the functions afr_txn_refresh_done and afr_inode_refresh_done respectively, they are calling a function within afr_txn_refresh_done and afr_inode_refresh_done respectively. So, afr_txn_refresh_done (frame=frame@entry=0x7ffcf4146478, this=this@entry=0x7fff64013720, err=5, err@entry=0)at afr-common.c:1222calls a function at line number 1222 in aft-common.c within the function afr_txn_refresh_done: 1163: int 1164: afr_txn_refresh_done(call_frame_t *frame, xlator_t *this, int err) 1165: { 1166: call_frame_t *heal_frame = NULL; 1167: afr_local_t *heal_local = NULL; 1168: afr_local_t *local = NULL; 1169: afr_private_t *priv = NULL; 1170: inode_t *inode = NULL; 1171: int event_generation = 0; 1172: int read_subvol = -1; 1173: int ret = 0; 1174: 1175: local = frame->local; 1176: inode = local->inode; 1177: priv = this->private; 1178: 1179: if (err) 1180: goto refresh_done; 1181: 1182: if (local->op == GF_FOP_LOOKUP) 1183: goto refresh_done; 1184: 1185: ret = afr_inode_get_readable(frame, inode, this, local->readable, 1186: &event_generation, local->transaction.type); 1187: 1188: if (ret == -EIO || (local->is_read_txn && !event_generation)) { 1189: /* No readable subvolume even after refresh ==> splitbrain.*/ *1190: ** if (!priv->fav_child_policy) {* *1191: err = EIO; **1192: goto refresh_done; **1193: ** }* 1194: read_subvol = afr_sh_get_fav_by_policy(this, local->replies, inode, 1195: NULL); 1196: if (read_subvol == -1) { 1197: err = EIO; 1198: goto refresh_done; 1199: } 1200: 1201: heal_frame = afr_frame_create(this, NULL); 1202: if (!heal_frame) { 1203: err = EIO; 1204: goto refresh_done; 1205: } 1206: heal_local = heal_frame->local; 1207: heal_local->xdata_req = dict_new(); 1208: if (!heal_local->xdata_req) { 1209: err = EIO; 1210: AFR_STACK_DESTROY(heal_frame); 1211: goto refresh_done; 1212: } 1213: heal_local->heal_frame = frame; 1214: ret = synctask_new(this->ctx->env, afr_fav_child_reset_sink_xattrs, 1215: afr_fav_child_reset_sink_xattrs_cbk, heal_frame, 1216: heal_frame); 1217: return 0; 1218: } 1219: 1220: refresh_done: 1221: afr_local_replies_wipe(local, this->private); *1222: local->refreshfn(frame, this, err);* 1223: 1224: return 0; 1225: } So, backtrace #1 represents the following function call local->refreshfn(frame=frame@entry=0x7ffcf4146478, this=this@entry=0x7fff64013720, err=5, err@entry=0) This is the 1st example of EIO being set. Setting a breakpoint at *1190: ** if (!priv->fav_child_policy) { *reveals that ret is not set, but local->is_read_txn is set and event_generation is zero (xlators/cluster/afr/src/afr.h:108), so the conditional at 1188 is true. Furthermore, priv->fav_child_policy is set to AFR_FAV_CHILD_NONE which is zero, so we found where the error is set to EIO, line 1191. The following is the gdb output: (gdb) print ret $1 = 0 (gdb) print local->is_read_txn $2 = true (gdb) print event_generation $3 = 0 (gdb) print priv->fav_child_policy $4 = AFR_FAV_CHILD_NONE I am not sure what this signifies though. It appears to be a read transaction with no event generation and no favorite child policy. Feel free to ask for clarification in case my thought process went awry somewhere. Thanks, -Scott On Thu, Apr 02, 2020 at 02:02:46AM -0500, Erik Jacobson wrote: > > Hmm, afr_inode_refresh_done() is called with error=0 and by the time we > > reach afr_txn_refresh_done(), it becomes 5(i.e. EIO). > > So afr_inode_refresh_done() is changing it to 5. Maybe you can put > > breakpoints/ log messages in afr_inode_refresh_done() at the places where > > error is getting changed and see where the assignment happens. > > I had a lot of struggles tonight getting the system ready to go. I had > seg11's in glusterfs(nfs) but I think it was related to not all brick > processes stopping with glusterd. I also re-installed and/or the print > statements. I'm not sure. I'm not used to seeing that. > > I put print statements everywhere I thought error could change and got > no printed log messages. > > I put break points where error would change and we didn't hit them. > > I then point a breakpoint at > > break xlators/cluster/afr/src/afr-common.c:1298 if error != 0 > > ---> refresh_done: > afr_txn_refresh_done(frame, this, error); > > And it never triggered (despite split-brain messages and my crapola > message). > > So I'm unable to explain this transition. I'm also not a gdb expert. > I still see the same back trace though. > > #1 0x00007fff68938d7b in afr_txn_refresh_done ( > frame=frame@entry=0x7ffd540ed8e8, this=this@entry=0x7fff64013720, err=5, > err@entry=0) at afr-common.c:1222 > #2 0x00007fff689391f0 in afr_inode_refresh_done ( > frame=frame@entry=0x7ffd540ed8e8, this=this@entry=0x7fff64013720, error=0) > at afr-common.c:1299 > > Is there other advice you might have for me to try? > > I'm very eager to solve this problem, which is why I'm staying up late > to get machine time. I must go to bed now. I look forward to another > shot tomorrow night if you have more ideas to try. > > Erik Erik Jacobson Software Engineer erik.jacobson@xxxxxxx +1 612 851 0550 Office Eagan, MN hpe.com ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users