On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote: > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote: > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski wrote: > > > On Wed, May 31, 2023 at 04:27:30PM +0000, Martin Wilck wrote: > > > > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote: > > > > > need_switch_pathgroup() only checks if the currently used > > > > > pathgroup > > > > > is > > > > > not the highest priority pathgroup. If it isn't, all multipathd > > > > > does > > > > > is > > > > > instruct the kernel to switch to the correct pathgroup. > > > > > However, the > > > > > kernel treats the pathgroups as if they were ordered by > > > > > priority. > > > > > When > > > > > the kernel runs out of paths to use in the currently selected > > > > > pathgroup, > > > > > it will start checking the pathgroups in order until it finds > > > > > one > > > > > with > > > > > usable paths. > > > > > > > > > > need_switch_pathgroup() should also check if the pathgroups are > > > > > out > > > > > of > > > > > order, and if so, multipathd should reload the map to reorder > > > > > them > > > > > correctly. > > > > > > > > > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx> > > > > > --- > > > > > libmultipath/libmultipath.version | 5 ++++ > > > > > libmultipath/switchgroup.c | 27 ++++++++++++++++++++++ > > > > > libmultipath/switchgroup.h | 1 + > > > > > multipathd/main.c | 38 +++++++++++++++++++++-- > > > > > ------ > > > > > -- > > > > > 4 files changed, 59 insertions(+), 12 deletions(-) > > > > > > > > > > diff --git a/libmultipath/libmultipath.version > > > > > b/libmultipath/libmultipath.version > > > > > index 8f72c452..38074699 100644 > > > > > --- a/libmultipath/libmultipath.version > > > > > +++ b/libmultipath/libmultipath.version > > > > > @@ -237,3 +237,8 @@ global: > > > > > local: > > > > > *; > > > > > }; > > > > > + > > > > > +LIBMULTIPATH_19.1.0 { > > > > > +global: > > > > > + path_groups_in_order; > > > > > +} LIBMULTIPATH_19.0.0; > > > > > diff --git a/libmultipath/switchgroup.c > > > > > b/libmultipath/switchgroup.c > > > > > index b1e1f39b..b1180839 100644 > > > > > --- a/libmultipath/switchgroup.c > > > > > +++ b/libmultipath/switchgroup.c > > > > > @@ -7,6 +7,33 @@ > > > > > #include "structs.h" > > > > > #include "switchgroup.h" > > > > > > > > > > +bool path_groups_in_order(struct multipath *mpp) > > > > > +{ > > > > > + int i; > > > > > + struct pathgroup *pgp; > > > > > + bool seen_marginal_pg = false; > > > > > + int last_prio = INT_MAX; > > > > > + > > > > > + if (VECTOR_SIZE(mpp->pg) < 2) > > > > > + return true; > > > > > + > > > > > + vector_foreach_slot(mpp->pg, pgp, i) { > > > > > + /* skip pgs with PRIO_UNDEF, since this is > > > > > likely > > > > > temporary */ > > > > > + if (!pgp->paths || pgp->priority == PRIO_UNDEF) > > > > > + continue; > > > > > + if (pgp->marginal && !seen_marginal_pg) { > > > > > + last_prio = INT_MAX; > > > > > + continue; > > > > > + } > > > > > + if (seen_marginal_pg && !pgp->marginal) > > > > > + return false; > > > > > + if (pgp->priority > last_prio) > > > > > + return false; > > > > > + last_prio = pgp->priority; > > > > > + } > > > > > + return true; > > > > > +} > > > > > + > > > > > void path_group_prio_update(struct pathgroup *pgp) > > > > > { > > > > > int i; > > > > > diff --git a/libmultipath/switchgroup.h > > > > > b/libmultipath/switchgroup.h > > > > > index 9365e2e2..43dbb6c9 100644 > > > > > --- a/libmultipath/switchgroup.h > > > > > +++ b/libmultipath/switchgroup.h > > > > > @@ -1,2 +1,3 @@ > > > > > void path_group_prio_update (struct pathgroup * pgp); > > > > > int select_path_group (struct multipath * mpp); > > > > > +bool path_groups_in_order(struct multipath *mpp); > > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > > > > index e7c272ad..2ea7c76b 100644 > > > > > --- a/multipathd/main.c > > > > > +++ b/multipathd/main.c > > > > > @@ -396,7 +396,7 @@ void > > > > > put_multipath_config(__attribute__((unused)) > > > > > void *arg) > > > > > } > > > > > > > > > > static int > > > > > -need_switch_pathgroup (struct multipath * mpp, int refresh) > > > > > +need_switch_pathgroup (struct multipath * mpp, int refresh, > > > > > bool > > > > > *need_reload) > > > > > { > > > > > struct pathgroup * pgp; > > > > > struct path * pp; > > > > > @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath * > > > > > mpp, > > > > > int refresh) > > > > > struct config *conf; > > > > > int bestpg; > > > > > > > > > > + *need_reload = false; > > > > > if (!mpp) > > > > > return 0; > > > > > > > > > > @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath * > > > > > mpp, > > > > > int refresh) > > > > > return 0; > > > > > > > > > > mpp->bestpg = bestpg; > > > > > - if (mpp->bestpg != mpp->nextpg) > > > > > - return 1; > > > > > + *need_reload = !path_groups_in_order(mpp); > > > > > > > > This will start another loop over the path groups. Can we just > > > > integrate the path_groups_in_order() logic into the loop right > > > > here? > > > > > > > > > > Sure > > > > Actually, after looking into this more, pushing those two functions > > together makes the logic more confusing. Plus select_path_group() is > > used by multiple other functions that don't need to check if the path > > groups are out of order. > > Hm. Can it happen at all that select_path_group() returns something > other than 1 but path_groups_in_order() returns true? Yes. It might even be the common case. Say a switch goes down and all the paths in the high priority pathgroup fail. The kernel will switch over to a lower priority pathgroup. As long as those paths work, it won't automatically switch back to the high priority pathgroup when we tell it that those failed paths have recovered. It's multipath's job to tell it when to proactively switch pathgroups. Since multipath doesn't update the priority of failed paths, the pathgroups should still look the same (unless you use group_by_prio and the path fails between checking the state and running the prioritizer, in which case you will likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's the thing group_by_tpg is trying to resolve). > If we follow the mindset you layed out in your patch ("the kernel > treats the pathgroups as if they were ordered by priority") > consequently, just determining bestpg is not enough; we'd need to sort > the PGs every time (except for a user-triggered switchgroup command). > IMO whenever we reload the map anyway (e.g. in setup_map()) we should > make sure that the PGs are properly sorted. Using "switch_group" > instead of a full reload is just an optimization because the kernel > operation is more light-weight than a full reload. But as soon as we > have e.g. a marginal path group, reordering is probably a better idea > most of the time. We already do correctly order the paths in setup_map(). setup_map() -> group_paths() -> sort_pathgroup(). Actually, looking at this, I don't see why we even bother to call select_path_group() in setup_map(). The answer will always be 1, since we just sorted them. -Ben > I agree that that would be another patch set, but I think that > determining the best path group and checking whether the groups are > correctly ordered are very closely related tasks. > > But it's not a religious matter to me, so proceed with what you > consider best at this time. > > > There's no reason for path_groups_in_order to be in libmultipath, > > since > > it's only needed by multipathd. I'll fix that. But I would rather not > > join it with select_path_group(). Since we just loop over the > > pathgroups > > and not the paths within them, we will likely go through the loop a > > couple of times, and we don't actually perform any costly actions > > during > > the loops that would make combining them look more attractive. The > > performance gains for need_switch_pathgroup() aren't worth making the > > logic harder to follow (and the minor performance hits when we don't > > need to check the order), IMHO. > > > > Regards, > Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel