Added to TODO: * Improve port/qsort() to handle sorts with 50% unique and 50% duplicate value [qsort] This involves choosing better pivot points for the quicksort. --------------------------------------------------------------------------- Dann Corbit wrote: > > > > -----Original Message----- > > From: pgsql-hackers-owner@xxxxxxxxxxxxxx [mailto:pgsql-hackers- > > owner@xxxxxxxxxxxxxx] On Behalf Of Tom Lane > > Sent: Wednesday, February 15, 2006 5:22 PM > > To: Ron > > Cc: pgsql-performance@xxxxxxxxxxxxxx; pgsql-hackers@xxxxxxxxxxxxxx > > Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create > Index > > behaviour) > > > > Ron <rjpeace@xxxxxxxxxxxxx> writes: > > > How are we choosing our pivots? > > > > See qsort.c: it looks like median of nine equally spaced inputs (ie, > > the 1/8th points of the initial input array, plus the end points), > > implemented as two rounds of median-of-three choices. With half of > the > > data inputs zero, it's not too improbable for two out of the three > > samples to be zeroes in which case I think the med3 result will be > zero > > --- so choosing a pivot of zero is much more probable than one would > > like, and doing so in many levels of recursion causes the problem. > > Adding some randomness to the selection of the pivot is a known > technique to fix the oddball partitions problem. However, Bentley and > Sedgewick proved that every quick sort algorithm has some input set that > makes it go quadratic (hence the recent popularity of introspective > sort, which switches to heapsort if quadratic behavior is detected. The > C++ template I submitted was an example of introspective sort, but > PostgreSQL does not use C++ so it was not helpful). > > > I think. I'm not too sure if the code isn't just being sloppy about > the > > case where many data values are equal to the pivot --- there's a > special > > case there to switch to insertion sort, and maybe that's getting > invoked > > too soon. > > Here are some cases known to make qsort go quadratic: > 1. Data already sorted > 2. Data reverse sorted > 3. Data organ-pipe sorted or ramp > 4. Almost all data of the same value > > There are probably other cases. Randomizing the pivot helps some, as > does check for in-order or reverse order partitions. > > Imagine if 1/3 of the partitions fall into a category that causes > quadratic behavior (have one of the above formats and have more than > CUTOFF elements in them). > > It is doubtful that the switch to insertion sort is causing any sort of > problems. It is only going to be invoked on tiny sets, for which it has > a fixed cost that is probably less that qsort() function calls on sets > of the same size. > > >It'd be useful to get a line-level profile of the behavior of > > this code in the slow cases... > > I guess that my in-order or presorted tests [which often arise when > there are very few distinct values] may solve the bad partition > problems. Don't forget that the algorithm is called recursively. > > > regards, tom lane > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 3: Have you checked our extensive FAQ? > > > > http://www.postgresql.org/docs/faq > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +