Re: [HACKERS] qsort again (was Re: Strange Create

Ron <rjpeace@xxxxxxxxxxxxx> · Fri, 17 Feb 2006 08:01:34 -0500

At 04:24 AM 2/17/2006, Ragnar wrote:
On fös, 2006-02-17 at 01:20 -0500, Ron wrote:
>
> OK, so here's _a_ way (there are others) to obtain a mapping such that
>   if a < b then f(a) < f (b) and
>   if a == b then f(a) == f(b)

> By scanning the table once, we can map say 0000001h (Hex used to ease
> typing) to the row with the minimum value and 1111111h to the row
> with the maximum value as well as mapping everything in between to
> their appropriate keys.  That same scan can be used to assign a
> pointer to each record's location.

This step is just as expensive as the original 
sort you want to replace/improve.

Why do you think that?  External sorts involve 
the equivalent of multiple scans of the table to 
be sorted, sometimes more than lgN (where N is 
the number of items in the table to be 
sorted).  Since this is physical IO we are 
talking about, each scan is very expensive, and 
therefore 1 scan is going to take considerably 
less time than >= lgN scans will be.

If you want to keep this mapping saved as a sort 
of an index, or as part ot each row data, this 
will make the cost of inserts and updates enormous.

Not sure you've got this right either.  Looks to 
me like we are adding a <= 32b quantity to each 
row.  Once we know the mapping, incrementally 
updating it upon insert or update would seem to 
be simple matter of a fast search for the correct 
ranking [Interpolation search, which we have all 
the needed data for, is O(lglgN).  Hash based 
search is O(1)]; plus an increment/decrement of 
the key values greater/less than the key value of 
the row being inserted / updated.  Given than we 
are updating all the keys in a specific range 
within a tree structure, that update can be done 
in O(lgm) (where m is the number of records affected).

>  We can now sort the key+pointer pairs instead of the actual data and
> use an optional final pass to rearrange the actual rows if we wish.

How are you suggesting this mapping be accessed? 
If the mapping is kept separate from the tuple 
data, as in an index, then how will you look up the key?
???  We've effectively created a data set where 
each record is a pointer to a DB row plus its 
key.  We can now sort the data set by key and 
then do an optional final pass to rearrange the 
actual DB rows if we so wish.  Since that final 
pass is very expensive, it is good that not all 
use scenarios will need that final pass.

The amount of storage required to sort this 
representation of the table rather than the 
actual table is so much less that it turns an 
external sorting problem into a internal sorting 
problem with an optional final pass that is =1= 
scan (albeit one scan with a lot of seeks and 
data movement).  This is a big win.  It is a 
variation of a well known technique.  See Sedgewick, Knuth, etc.

> That initial scan to set up the keys is expensive, but if we wish
> that cost can be amortized over the life of the table so we don't
> have to pay it all at once.  In addition, once we have created those
> keys, then can be saved for later searches and sorts.

What is the use case where this would work better than a
regular btree index ?
Again, ???  btree indexes address different 
issues.  They do not in any way help create a 
compact data representation of the original data 
that saves enough space so as to turn an external 
ranking or sorting problem into an internal one.

Ron