On Tue, Jun 06, 2023 at 04:32:13PM +0000, Martin Wilck wrote: > On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote: > > On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote: > > > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote: > > > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski > > > > > > > > > Actually, after looking into this more, pushing those two > > > > functions > > > > together makes the logic more confusing. Plus select_path_group() > > > > is > > > > used by multiple other functions that don't need to check if the > > > > path > > > > groups are out of order. > > > > > > Hm. Can it happen at all that select_path_group() returns something > > > other than 1 but path_groups_in_order() returns true? > > > > Yes. It might even be the common case. Say a switch goes down and all > > the paths in the high priority pathgroup fail. The kernel will switch > > over to a lower priority pathgroup. As long as those paths work, it > > won't automatically switch back to the high priority pathgroup when > > we > > tell it that those failed paths have recovered. It's multipath's job > > to > > tell it when to proactively switch pathgroups. Since multipath > > doesn't > > update the priority of failed paths, the pathgroups should still look > > the same (unless you use group_by_prio and the path fails between > > checking the state and running the prioritizer, in which case you > > will > > likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's > > the > > thing group_by_tpg is trying to resolve). > > Ok, this is subtle; it's caused by the fact that path_groups_in_order() > ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will be > the prio of a PG with only failed paths), whereas select_path_group() > will ignore such PGs it in a different way - by never selecting them. > I hope I understand correctly now. > > I have to say this is confusing. We have different concepts of how path > priority and path state together affect the PG priority, and we apply > these different concepts in different parts of the code. I'm not saying > it's wrong, but at the moment I'm too confused to tell if it's right. It might make sense to have these be even more different. Perhaps select_path_group should stay the same but sort_pathgroups() and path_groups_in_order() should just look at the priorities of the paths that have a non-PRIO_UNDEF priority and use the total number of paths for tie-breakers. This would mean that they would order the pathgroups how they should be when everything is working correctly. That way if something forces the kernel to pick a new pathgroup (which is likely the failure of all the paths in the current pathgroup), it will switch to the pathgroup that, all things being equal, should be the best. Obviously, if there is another pathgroup of equal prioity with more usable paths, multipath can tell it to switch to that one, but I assume that even without group_by_prio, multiple pathgroups with the same priority will be uncommon. In this situation, it would make sense to call select_path_group() after calling group_paths(), since the first pathgroup might not currently be the best pathgroup, if its paths are down. At any rate, it's not something to worry about in this patchset. -Ben > > > > > If we follow the mindset you layed out in your patch ("the kernel > > > treats the pathgroups as if they were ordered by priority") > > > consequently, just determining bestpg is not enough; we'd need to > > > sort > > > the PGs every time (except for a user-triggered switchgroup > > > command). > > > IMO whenever we reload the map anyway (e.g. in setup_map()) we > > > should > > > make sure that the PGs are properly sorted. Using "switch_group" > > > instead of a full reload is just an optimization because the kernel > > > operation is more light-weight than a full reload. But as soon as > > > we > > > have e.g. a marginal path group, reordering is probably a better > > > idea > > > most of the time. > > > > We already do correctly order the paths in setup_map(). > > setup_map() -> group_paths() -> sort_pathgroup(). Actually, looking > > at > > this, I don't see why we even bother to call select_path_group() in > > setup_map(). The answer will always be 1, since we just sorted them. > > > > Right. I suppose the call to select_path_groups() predates the one to > sort_pathgroups(). > > Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel