On Sat, 2024-07-13 at 02:05 -0400, Benjamin Marzinski wrote: > multipathd's path checking can be very bursty, with all the paths > running their path checkers at the same time, and then all doing > nothing > while waiting for the next check time. Alternatively, the paths in a > multipath device might all run their path checkers at a different > time, > which can keep the multipath device from having a coherent view of > the > state of all of its paths. > > This patch makes all the paths of a multipath device converge to > running > their checkers at the same time, and spreads out when this time is > for > the different multipath devices. > > To do this, the checking time is divided into adjustment intervals > (conf->adjust_int), so that the checkers run at some index within > this > interval. conf->adjust_int is chosen so that it is at least twice as > long as conf->max_checkint, and is a multiple of all possible > pp->checkint values. This means that regardless of pp->checkint, the > path should always be checked on the same indexes, each adjustement > interval. > > Each multipath device has a goal index. These are evenly spread out > between 0 and conf->max_checkint. Every conf->adjust_int seconds, > each > multipath device should try to check all of its paths on its goal > index. > If a path isn't checked on the goal index, then pp->tick for the next > check after the goal_idx will be decremented by one, to align it over > time. > > In order for the path checkers to run every pp->checkint seconds, > multipathd needs to track how long a path check has been pending for, > and subtract that time from the number of ticks till the checker is > run > again. If the checker has been pending for more that pp->checkint, > the path will be rechecked on the next tick after the checker > returns. > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> Nice, thanks a lot. I have a few remarks below. > --- > libmultipath/config.c | 13 ++++++ > libmultipath/config.h | 1 + > libmultipath/structs.c | 1 + > libmultipath/structs.h | 1 + > multipathd/main.c | 90 ++++++++++++++++++++++++++++++++++------ > -- > 5 files changed, 89 insertions(+), 17 deletions(-) > > diff --git a/libmultipath/config.c b/libmultipath/config.c > index 83fa7369..ca584372 100644 > --- a/libmultipath/config.c > +++ b/libmultipath/config.c > @@ -982,6 +982,19 @@ int _init_config (const char *file, struct > config *conf) > conf->checkint = conf->max_checkint; > condlog(3, "polling interval: %d, max: %d", > conf->checkint, conf->max_checkint); > + /* > + * make sure that that adjust_int is at least twice as large > + * as checkint and is a multiple of all possible values of > + * pp->checkint. > + */ > + if (conf->max_checkint % conf->checkint == 0) { > + conf->adjust_int = 2 * conf->max_checkint; > + } else { > + conf->adjust_int = conf->checkint; > + while (2 * conf->adjust_int < conf->max_checkint) > + conf->adjust_int *= 2; > + conf->adjust_int *= conf->max_checkint; > + } > > if (conf->blist_devnode == NULL) { > conf->blist_devnode = vector_alloc(); > diff --git a/libmultipath/config.h b/libmultipath/config.h > index 384193ab..800c0ca9 100644 > --- a/libmultipath/config.h > +++ b/libmultipath/config.h > @@ -147,6 +147,7 @@ struct config { > int minio_rq; > unsigned int checkint; > unsigned int max_checkint; > + unsigned int adjust_int; > bool use_watchdog; > int pgfailback; > int rr_weight; > diff --git a/libmultipath/structs.c b/libmultipath/structs.c > index 1583e001..472e1001 100644 > --- a/libmultipath/structs.c > +++ b/libmultipath/structs.c > @@ -148,6 +148,7 @@ uninitialize_path(struct path *pp) > pp->dmstate = PSTATE_UNDEF; > pp->uid_attribute = NULL; > pp->checker_timeout = 0; > + pp->pending_ticks = 0; > > if (checker_selected(&pp->checker)) > checker_put(&pp->checker); > diff --git a/libmultipath/structs.h b/libmultipath/structs.h > index 002eeae1..457d7836 100644 > --- a/libmultipath/structs.h > +++ b/libmultipath/structs.h > @@ -360,6 +360,7 @@ struct path { > unsigned long long size; > unsigned int checkint; > unsigned int tick; > + unsigned int pending_ticks; > int bus; > int offline; > int state; > diff --git a/multipathd/main.c b/multipathd/main.c > index daf668eb..42ccb92f 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -2386,7 +2386,7 @@ check_mpp(struct vectors * vecs, struct > multipath *mpp, unsigned int ticks) > * and '0' otherwise > */ > static int > -check_path (struct vectors * vecs, struct path * pp, unsigned int > ticks) > +do_check_path (struct vectors * vecs, struct path * pp) > { > int newstate; > int new_path_up = 0; > @@ -2398,14 +2398,6 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > int marginal_pathgroups, marginal_changed = 0; > bool need_reload; > > - if (pp->initialized == INIT_REMOVED) > - return 0; > - > - if (pp->tick) > - pp->tick -= (pp->tick > ticks) ? ticks : pp->tick; > - if (pp->tick) > - return 0; /* don't check this path yet */ > - > conf = get_multipath_config(); > checkint = conf->checkint; > max_checkint = conf->max_checkint; > @@ -2417,12 +2409,6 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > pp->checkint = checkint; > }; > > - /* > - * provision a next check soonest, > - * in case we exit abnormally from here > - */ > - pp->tick = checkint; > - > newstate = check_path_state(pp); > if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) > return 0; > @@ -2584,7 +2570,6 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > condlog(4, "%s: delay next check > %is", > pp->dev_t, pp->checkint); > } > - pp->tick = pp->checkint; > } > } > else if (newstate != PATH_UP && newstate != PATH_GHOST) { > @@ -2637,6 +2622,76 @@ check_path (struct vectors * vecs, struct path > * pp, unsigned int ticks) > return 1; > } > > +static int > +check_path (struct vectors * vecs, struct path * pp, unsigned int > ticks, > + time_t start_secs) > +{ > + int r; > + unsigned int adjust_int, max_checkint; > + struct config *conf; > + time_t next_idx, goal_idx; > + > + if (pp->initialized == INIT_REMOVED) > + return 0; > + > + if (pp->tick) > + pp->tick -= (pp->tick > ticks) ? ticks : pp->tick; > + if (pp->tick) > + return 0; /* don't check this path yet */ > + > + conf = get_multipath_config(); > + max_checkint = conf->max_checkint; > + adjust_int = conf->adjust_int; > + put_multipath_config(conf); > + > + r = do_check_path(vecs, pp); > + > + /* > + * do_check_path() removed or orphaned the path. > + */ > + if (r < 0 || !pp->mpp) > + return r; > + > + /* > + * the path checker is pending > + */ > + if (pp->tick != 0) { I'm not getting this logic. Please explain. Why is pp->tick != 0 at this point equivalent to the checker being pending? > + pp->pending_ticks++; > + return r; > + } > + > + /* schedule the next check */ > + pp->tick = pp->checkint; > + if (pp->pending_ticks >= pp->tick) > + pp->tick = 1; > + else > + pp->tick -= pp->pending_ticks; > + pp->pending_ticks = 0; > + > + if (pp->tick == 1) > + return r; > + > + /* > + * every mpp has a goal_idx in the range of > + * 0 <= goal_idx < conf->max_checkint > + * > + * The next check has an index, next_idx, in the range of > + * 0 <= next_idx < conf->adjust_int > + * > + * Every adjust_int seconds, each multipath device should > try to > + * check all of its paths on its goal_idx. If a path isn't > checked > + * on the goal_idx, then pp->tick for the next check after > the > + * goal_idx will be decremented by one, to align it over > time. > + */ > + goal_idx = (find_slot(vecs->mpvec, pp->mpp)) * > + max_checkint / VECTOR_SIZE(vecs->mpvec); > + next_idx = (start_secs + pp->tick) % adjust_int; > + if (next_idx > goal_idx && next_idx - goal_idx < pp- > >checkint) > + pp->tick--; Hm. IMHO this calculation should be little simpler, as follows: cur_idx = start_secs % adjust_int; if ((goal_idx - cur_idx) % pp->checkint != 0) pp->tick--; My reasoning goes like this: If (goal_idx - cur_idx) % pp->checkint is 0, we need no adjustment, because (assuming checkint remains constant) it will be checked at the next goal_idx. Otherwise, we check one tick earlier, and will do so at the next check again, etc. This is similar to your approach, but not exactly the same, AFAICS. In particular, I am unsure about your "if (next_idx - goal_idx < pp->checkint)" condition. If checkint grows toward max_checkint, it may take longer until this algorithm converges, but sooner or later, it should. Instead of stepping, we could also force the idx immediately, by doing pp->tick -= (goal_idx - cur_idx) % pp->checkint; which would be even simpler (the check intervals would be less evenly spaced, but I don't think that would hurt us much). Am I misguided here? Regards Martin