On Tue, Aug 17, 2021 at 11:14 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Tue, 2021-08-17 at 10:56 -0700, Gregory Farnum wrote: > > While I agree we don’t want to crash on unexpected input from the > > network, if we are tossing out a map we need to shut down the mount as > > well. If we think the system metadata is invalid, that’s not really a > > recoverable condition and continuing to do IO is a mistake from the > > whole-system perspective — either the server has failed horribly or > > there’s something the client doesn’t understand which may be critical > > to correctness; either way there's a big problem with the basic system > > operation. (I mean, if we hit this point obviously the server has > > failed horribly since we should have gated it, but it may have failed > > horribly in some non-code-logic fashion.) > > -Greg > > > > I see this as essentially the same as any other parsing error in the > mdsmap. When we hit one of those, we currently just do this: > > pr_err("error decoding fsmap\n"); > > ...and soldier on. It's been this way since the beginning, afaict. Oh. That's, uh, interesting. I mean, you're right, this case isn't any more special. I just didn't know that's how the kernel client handles it. (The userspace client inherits the usual userspace decode logic and any accompanying asserts.) > If we want to do something more involved there, then that could probably > be done, but it's not as simple as throwing a switch. We may have open > files and dirty data to deal with. We do have some code to deal with > attempting to reconnect after a blacklist event, so you might be able to > treat this similarly. Hmm, my guess is this only happens if the MDS is spewing nonsense out over its pipe, or we've made a logic error and let a client join across a non-backwards-compatible encoding/feature change. I think we probably just start throwing EIO and don't try to remount, rather than going for anything more polite. *shrug* > In any case, this would be a pretty unusual situation, and I don't see > us having the manpower to spend on coding up an elegant solution to this > potential problem anytime soon. It might be worth opening a tracker for > it though if that changes in the future. Makes sense. Ticket done: https://tracker.ceph.com/issues/52303 -Greg > > > > > > > > > + export_targets[m_info->export_targets[i]] = 1; > > > > + } > > > > + } > > > > + > > > > for (i = 0; i < oldmap->possible_max_rank && i < mdsc->max_sessions; i++) { > > > > if (!mdsc->sessions[i]) > > > > continue; > > > > @@ -4242,6 +4253,8 @@ static void check_new_map(struct ceph_mds_client *mdsc, > > > > if (s->s_state == CEPH_MDS_SESSION_RESTARTING && > > > > newstate >= CEPH_MDS_STATE_RECONNECT) { > > > > mutex_unlock(&mdsc->mutex); > > > > + if (export_targets) > > > > + export_targets[i] = 0; > > > > send_mds_reconnect(mdsc, s); > > > > mutex_lock(&mdsc->mutex); > > > > } > > > > @@ -4264,6 +4277,47 @@ static void check_new_map(struct ceph_mds_client *mdsc, > > > > } > > > > } > > > > > > > > + for (i = 0; i < newmap->possible_max_rank; i++) { > > > > > > The condition on this loop is slightly different from the one below it, > > > and I'm not sure why. Should this also be checking this? > > > > > > i < newmap->possible_max_rank && i < mdsc->max_sessions > > > > > > ...do we need to look at export targets where i >= mdsc->max_sessions ? > > > > > > > + if (!export_targets) > > > > + break; > > > > + > > > > + /* > > > > + * Only open and reconnect sessions that don't > > > > + * exist yet. > > > > + */ > > > > + if (!export_targets[i] || __have_session(mdsc, i)) > > > > + continue; > > > > + > > > > + /* > > > > + * In case the export MDS is crashed just after > > > > + * the EImportStart journal is flushed, so when > > > > + * a standby MDS takes over it and is replaying > > > > + * the EImportStart journal the new MDS daemon > > > > + * will wait the client to reconnect it, but the > > > > + * client may never register/open the sessions > > > > + * yet. > > > > + * > > > > + * It will try to reconnect that MDS daemons if > > > > + * the MDSes are in the export targets and is the > > > > + * RECONNECT state. > > > > + */ > > > > + newstate = ceph_mdsmap_get_state(newmap, i); > > > > + if (newstate != CEPH_MDS_STATE_RECONNECT) > > > > + continue; > > > > + s = __open_export_target_session(mdsc, i); > > > > + if (IS_ERR(s)) { > > > > + err = PTR_ERR(s); > > > > + pr_err("failed to open export target session, err %d\n", > > > > + err); > > > > + continue; > > > > + } > > > > + dout("send reconnect to target mds.%d\n", i); > > > > + mutex_unlock(&mdsc->mutex); > > > > + send_mds_reconnect(mdsc, s); > > > > + mutex_lock(&mdsc->mutex); > > > > + ceph_put_mds_session(s); > > > > > > Suppose we end up in this part of the code, and we have to drop the > > > mdsc->mutex like this. What ensures that an earlier session in the array > > > won't end up going back into CEPH_MDS_STATE_RECONNECT before we can get > > > into the loop below? This looks racy. > > > > > > > + } > > > > + > > > > for (i = 0; i < newmap->possible_max_rank && i < mdsc->max_sessions; i++) { > > > > s = mdsc->sessions[i]; > > > > if (!s) > > > > @@ -4278,6 +4332,8 @@ static void check_new_map(struct ceph_mds_client *mdsc, > > > > __open_export_target_sessions(mdsc, s); > > > > } > > > > } > > > > + > > > > + kfree(export_targets); > > > > } > > > > > > > > > > > > > > -- > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > -- > Jeff Layton <jlayton@xxxxxxxxxx> >