> -----Original Message----- > From: pgsql-hackers-owner@xxxxxxxxxxxxxx [mailto:pgsql-hackers- > owner@xxxxxxxxxxxxxx] On Behalf Of Tom Lane > Sent: Wednesday, February 15, 2006 5:22 PM > To: Ron > Cc: pgsql-performance@xxxxxxxxxxxxxx; pgsql-hackers@xxxxxxxxxxxxxx > Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index > behaviour) > > Ron <rjpeace@xxxxxxxxxxxxx> writes: > > How are we choosing our pivots? > > See qsort.c: it looks like median of nine equally spaced inputs (ie, > the 1/8th points of the initial input array, plus the end points), > implemented as two rounds of median-of-three choices. With half of the > data inputs zero, it's not too improbable for two out of the three > samples to be zeroes in which case I think the med3 result will be zero > --- so choosing a pivot of zero is much more probable than one would > like, and doing so in many levels of recursion causes the problem. Adding some randomness to the selection of the pivot is a known technique to fix the oddball partitions problem. However, Bentley and Sedgewick proved that every quick sort algorithm has some input set that makes it go quadratic (hence the recent popularity of introspective sort, which switches to heapsort if quadratic behavior is detected. The C++ template I submitted was an example of introspective sort, but PostgreSQL does not use C++ so it was not helpful). > I think. I'm not too sure if the code isn't just being sloppy about the > case where many data values are equal to the pivot --- there's a special > case there to switch to insertion sort, and maybe that's getting invoked > too soon. Here are some cases known to make qsort go quadratic: 1. Data already sorted 2. Data reverse sorted 3. Data organ-pipe sorted or ramp 4. Almost all data of the same value There are probably other cases. Randomizing the pivot helps some, as does check for in-order or reverse order partitions. Imagine if 1/3 of the partitions fall into a category that causes quadratic behavior (have one of the above formats and have more than CUTOFF elements in them). It is doubtful that the switch to insertion sort is causing any sort of problems. It is only going to be invoked on tiny sets, for which it has a fixed cost that is probably less that qsort() function calls on sets of the same size. >It'd be useful to get a line-level profile of the behavior of > this code in the slow cases... I guess that my in-order or presorted tests [which often arise when there are very few distinct values] may solve the bad partition problems. Don't forget that the algorithm is called recursively. > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq