On Wed, 2022-08-24 at 11:16 -0500, Benjamin Marzinski wrote: > On Wed, Aug 24, 2022 at 09:07:56AM +0200, Martin Wilck wrote: > > On Tue, 2022-08-23 at 15:48 -0500, Benjamin Marzinski wrote: > > > On Thu, Aug 18, 2022 at 11:06:28PM +0200, mwilck@xxxxxxxx wrote: > > > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > > > > > If the mptable is very large (for example, in a configuration > > > > with > > > > lots of maps assigned individual aliases), merge_mptable may > > > > get > > > > very slow because it needs to make O(n^2) string comparisons > > > > (with > > > > n being the number of mptable entries). WWID strings often > > > > differ > > > > only in the last few bytes, causing a lot of CPU time spent in > > > > strcmp(). > > > > > > > > merge_mptable is executed whenever multipath or multipathd is > > > > started, that > > > > is, also for "multipath -u" and "multipath -U" invocations from > > > > udev rules. > > > > Optimize it by sorting the mptable before merging. This way we > > > > don't need > > > > to iterate towards the end of the vector searching for > > > > duplicates. > > > > > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > > > --- > > > > libmultipath/config.c | 15 +++++++++++++-- > > > > libmultipath/vector.c | 8 ++++++++ > > > > libmultipath/vector.h | 1 + > > > > 3 files changed, 22 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/libmultipath/config.c b/libmultipath/config.c > > > > index ab8b26e..a2c79a4 100644 > > > > --- a/libmultipath/config.c > > > > +++ b/libmultipath/config.c > > > > @@ -520,6 +520,14 @@ merge_mpe(struct mpentry *dst, struct > > > > mpentry > > > > *src) > > > > merge_num(mode); > > > > } > > > > > > > > +static int wwid_compar(const void *p1, const void *p2) > > > > +{ > > > > + const char *wwid1 = (*(struct mpentry * const *)p1)- > > > > >wwid; > > > > + const char *wwid2 = (*(struct mpentry * const *)p2)- > > > > >wwid; > > > > + > > > > + return strcmp(wwid1, wwid2); > > > > +} > > > > + > > > > void merge_mptable(vector mptable) > > > > { > > > > struct mpentry *mp1, *mp2; > > > > @@ -533,10 +541,13 @@ void merge_mptable(vector mptable) > > > > free_mpe(mp1); > > > > continue; > > > > } > > > > + } > > > > + vector_sort(mptable, wwid_compar); > > > > + vector_foreach_slot(mptable, mp1, i) { > > > > j = i + 1; > > > > vector_foreach_slot_after(mptable, mp2, j) { > > > > - if (!mp2->wwid || strcmp(mp1->wwid, > > > > mp2- > > > > > wwid)) > > > > - continue; > > > > + if (strcmp(mp1->wwid, mp2->wwid)) > > > > + break; > > > > condlog(1, "%s: duplicate multipath > > > > config > > > > section for %s", > > > > __func__, mp1->wwid); > > > > merge_mpe(mp2, mp1); > > > > > > The way merge_mpe() works, values are only copied from mp1's > > > variables > > > if the corresponding variable is unset in mp2. This requires the > > > original order of mp1 and mp2 to be unchanged by the sorting > > > algorithm, > > > but according to the qsort() man page "If two members compare as > > > equal, > > > their order in the sorted array is undefined." > > > > > > One quick and dirty way we could fix this is to add a variable to > > > struct > > > mptable called config_idx, which is its index in the config > > > file. If > > > the wwids are equal, you compare that. > > > > Thanks for pointing this out. I believe it's easier than that: as > > we're > > passed the offsets into the array (aka struct mpentry **), we can > > simply compare p1 and p2 if the strings are equal. > > > > Agree? > > I don't think so. Comparing the array offsets assumes that two > mpentries > are still ordered correctly when they are compared against each > other. > But the way quick sort works, array elements can get swapped around > multiple times, based on whether they are larger or smaller than some > pivot point. It's possible that the two mpentries already switched > order > before they were compared. > > Essentially, all comparing the offset does is force qsort to not > switch > the place of two otherwise equal entries. But for speed reasons > alone, > there is no reason why qsort would bother to swap the location of two > equal entries. > > Here's an example of what could go wrong: > > Say you start with the array (I'm also tracking the mpentry's > original > config index) > > array offset: 0 1 2 3 4 > config_idx: 0 1 2 3 4 > wwid: D B C D A > > Say qsort picks the element at array offset 2 (wwid "C") as the > pivot. > Usually quicksort works by walking towards the center of the array > segment from both ends, swapping the positions of elements bigger > than > the pivot with the positions of ones smaller than or equal to the > pivot. > So after the first round you would swap the element at array offset 4 > (wwid "A") with the element at array offset 0 (wwid "D"). This would > give you. > > array offset: 0 1 2 3 4 > config_idx 4 1 2 3 0 > wwid: A B C D D > > After this no further swaps will happen using the original > wwid_compar(). Adding a comparison to the array offset won't change > anything. But the wwid "D" mpentry that was orinally earlier in the > config (config_idx = 0) is now after the wwid "D" mpentry that was > originally later (config_idx = 3). > > Comparing the config_idx will cause the elements at array offset 3 > and 4 > to switch places, restoring their original ordering. Hm, too bad. I don't like the idea of adding another field to the array just to stabilize the sort. But a fast, stable sort algorithm in for strings in C seems to be hard to find. So yes, let's add the sort index for now, perhaps we'll find a more elegant solution later. Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel