Re: [HACKERS] qsort again (was Re: Strange Create Index behaviour)

"Dann Corbit" <DCorbit@xxxxxxxxx> · Wed, 15 Feb 2006 17:37:58 -0800

> -----Original Message-----
> From: pgsql-hackers-owner@xxxxxxxxxxxxxx [mailto:pgsql-hackers-
> owner@xxxxxxxxxxxxxx] On Behalf Of Tom Lane
> Sent: Wednesday, February 15, 2006 5:22 PM
> To: Ron
> Cc: pgsql-performance@xxxxxxxxxxxxxx; pgsql-hackers@xxxxxxxxxxxxxx
> Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
Index
> behaviour)
> 
> Ron <rjpeace@xxxxxxxxxxxxx> writes:
> > How are we choosing our pivots?
> 
> See qsort.c: it looks like median of nine equally spaced inputs (ie,
> the 1/8th points of the initial input array, plus the end points),
> implemented as two rounds of median-of-three choices.  With half of
the
> data inputs zero, it's not too improbable for two out of the three
> samples to be zeroes in which case I think the med3 result will be
zero
> --- so choosing a pivot of zero is much more probable than one would
> like, and doing so in many levels of recursion causes the problem.

Adding some randomness to the selection of the pivot is a known
technique to fix the oddball partitions problem.  However, Bentley and
Sedgewick proved that every quick sort algorithm has some input set that
makes it go quadratic (hence the recent popularity of introspective
sort, which switches to heapsort if quadratic behavior is detected.  The
C++ template I submitted was an example of introspective sort, but
PostgreSQL does not use C++ so it was not helpful).

> I think.  I'm not too sure if the code isn't just being sloppy about
the
> case where many data values are equal to the pivot --- there's a
special
> case there to switch to insertion sort, and maybe that's getting
invoked
> too soon.  

Here are some cases known to make qsort go quadratic:
1. Data already sorted
2. Data reverse sorted
3. Data organ-pipe sorted or ramp
4. Almost all data of the same value

There are probably other cases.  Randomizing the pivot helps some, as
does check for in-order or reverse order partitions.

Imagine if 1/3 of the partitions fall into a category that causes
quadratic behavior (have one of the above formats and have more than
CUTOFF elements in them).

It is doubtful that the switch to insertion sort is causing any sort of
problems.  It is only going to be invoked on tiny sets, for which it has
a fixed cost that is probably less that qsort() function calls on sets
of the same size.

>It'd be useful to get a line-level profile of the behavior of
> this code in the slow cases...

I guess that my in-order or presorted tests [which often arise when
there are very few distinct values] may solve the bad partition
problems.  Don't forget that the algorithm is called recursively.

> 			regards, tom lane
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faq