On Thu, Aug 25, 2022 at 09:44:22AM +0200, Martin Wilck wrote: > On Wed, 2022-08-24 at 19:02 +0200, Martin Wilck wrote: > > On Wed, 2022-08-24 at 11:16 -0500, Benjamin Marzinski wrote: > > > On Wed, Aug 24, 2022 at 09:07:56AM +0200, Martin Wilck wrote: > > > > On Tue, 2022-08-23 at 15:48 -0500, Benjamin Marzinski wrote: > > > > > On Thu, Aug 18, 2022 at 11:06:28PM +0200, > > > > > mwilck@xxxxxxxx wrote: > > > > > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > > > > > > > > > If the mptable is very large (for example, in a configuration > > > > > > with > > > > > > lots of maps assigned individual aliases), merge_mptable may > > > > > > get > > > > > > very slow because it needs to make O(n^2) string comparisons > > > > > > (with > > > > > > n being the number of mptable entries). WWID strings often > > > > > > differ > > > > > > only in the last few bytes, causing a lot of CPU time spent > > > > > > in > > > > > > strcmp(). > > > > > > > > > > > > merge_mptable is executed whenever multipath or multipathd is > > > > > > started, that > > > > > > is, also for "multipath -u" and "multipath -U" invocations > > > > > > from > > > > > > udev rules. > > > > > > Optimize it by sorting the mptable before merging. This way > > > > > > we > > > > > > don't need > > > > > > to iterate towards the end of the vector searching for > > > > > > duplicates. > > > > > > > > > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > > > > > --- > > > > > > libmultipath/config.c | 15 +++++++++++++-- > > > > > > libmultipath/vector.c | 8 ++++++++ > > > > > > libmultipath/vector.h | 1 + > > > > > > 3 files changed, 22 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/libmultipath/config.c b/libmultipath/config.c > > > > > > index ab8b26e..a2c79a4 100644 > > > > > > --- a/libmultipath/config.c > > > > > > +++ b/libmultipath/config.c > > > > > > @@ -520,6 +520,14 @@ merge_mpe(struct mpentry *dst, struct > > > > > > mpentry > > > > > > *src) > > > > > > merge_num(mode); > > > > > > } > > > > > > > > > > > > +static int wwid_compar(const void *p1, const void *p2) > > > > > > +{ > > > > > > + const char *wwid1 = (*(struct mpentry * const *)p1)- > > > > > > > wwid; > > > > > > + const char *wwid2 = (*(struct mpentry * const *)p2)- > > > > > > > wwid; > > > > > > + > > > > > > + return strcmp(wwid1, wwid2); > > > > > > +} > > > > > > + > > > > > > void merge_mptable(vector mptable) > > > > > > { > > > > > > struct mpentry *mp1, *mp2; > > > > > > @@ -533,10 +541,13 @@ void merge_mptable(vector mptable) > > > > > > free_mpe(mp1); > > > > > > continue; > > > > > > } > > > > > > + } > > > > > > + vector_sort(mptable, wwid_compar); > > > > > > + vector_foreach_slot(mptable, mp1, i) { > > > > > > j = i + 1; > > > > > > vector_foreach_slot_after(mptable, mp2, j) { > > > > > > - if (!mp2->wwid || strcmp(mp1->wwid, > > > > > > mp2- > > > > > > > wwid)) > > > > > > - continue; > > > > > > + if (strcmp(mp1->wwid, mp2->wwid)) > > > > > > + break; > > > > > > condlog(1, "%s: duplicate multipath > > > > > > config > > > > > > section for %s", > > > > > > __func__, mp1->wwid); > > > > > > merge_mpe(mp2, mp1); > > > > > > > > > > The way merge_mpe() works, values are only copied from mp1's > > > > > variables > > > > > if the corresponding variable is unset in mp2. This requires > > > > > the > > > > > original order of mp1 and mp2 to be unchanged by the sorting > > > > > algorithm, > > > > > but according to the qsort() man page "If two members compare > > > > > as > > > > > equal, > > > > > their order in the sorted array is undefined." > > > > > > > > > > One quick and dirty way we could fix this is to add a variable > > > > > to > > > > > struct > > > > > mptable called config_idx, which is its index in the config > > > > > file. If > > > > > the wwids are equal, you compare that. > > > > > > > > Thanks for pointing this out. I believe it's easier than that: as > > > > we're > > > > passed the offsets into the array (aka struct mpentry **), we can > > > > simply compare p1 and p2 if the strings are equal. > > > > > > > > Agree? > > > > > > I don't think so. Comparing the array offsets assumes that two > > > mpentries > > > are still ordered correctly when they are compared against each > > > other. > > > But the way quick sort works, array elements can get swapped around > > > multiple times, based on whether they are larger or smaller than > > > some > > > pivot point. It's possible that the two mpentries already switched > > > order > > > before they were compared. > > > > > > Essentially, all comparing the offset does is force qsort to not > > > switch > > > the place of two otherwise equal entries. But for speed reasons > > > alone, > > > there is no reason why qsort would bother to swap the location of > > > two > > > equal entries. > > > > > > Here's an example of what could go wrong: > > > > > > Say you start with the array (I'm also tracking the mpentry's > > > original > > > config index) > > > > > > array offset: 0 1 2 3 4 > > > config_idx: 0 1 2 3 4 > > > wwid: D B C D A > > > > > > Say qsort picks the element at array offset 2 (wwid "C") as the > > > pivot. > > > Usually quicksort works by walking towards the center of the array > > > segment from both ends, swapping the positions of elements bigger > > > than > > > the pivot with the positions of ones smaller than or equal to the > > > pivot. > > > So after the first round you would swap the element at array offset > > > 4 > > > (wwid "A") with the element at array offset 0 (wwid "D"). This > > > would > > > give you. > > > > > > array offset: 0 1 2 3 4 > > > config_idx 4 1 2 3 0 > > > wwid: A B C D D > > > > > > After this no further swaps will happen using the original > > > wwid_compar(). Adding a comparison to the array offset won't change > > > anything. But the wwid "D" mpentry that was orinally earlier in the > > > config (config_idx = 0) is now after the wwid "D" mpentry that was > > > originally later (config_idx = 3). > > > > > > Comparing the config_idx will cause the elements at array offset 3 > > > and 4 > > > to switch places, restoring their original ordering. > > > > Hm, too bad. > > > > I don't like the idea of adding another field to the array just to > > stabilize the sort. But a fast, stable sort algorithm in for strings > > in > > C seems to be hard to find. So yes, let's add the sort index for now, > > perhaps we'll find a more elegant solution later. > > Digging some more, I found that glibc's qsort() is actually merge sort, > which is a stable sort algorithm and doesn't suffer from this issue. Huh. Good to know. > The glibc documentation is misleading in this respect. Unfortunately > we'can't just support glibc. But we could simply copy in glibc's > msort.c code for other libraries. Sounds reasonable. -Ben > Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel