Re: [HACKERS] qsort again (was Re: Strange Create

Ron <rjpeace@xxxxxxxxxxxxx> · Fri, 17 Feb 2006 08:23:40 -0500

At 05:19 AM 2/17/2006, Markus Schaber wrote:
Hi, Ron,

Ron schrieb:

> OK, so here's _a_ way (there are others) to obtain a mapping such that
>  if a < b then f(a) < f (b) and
>  if a == b then f(a) == f(b)
>
> Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
> integer; a 4KB row becomes a 32Kb integer; etc)
> Since even a 1TB table made of such rows can only have 256M - 512M
> possible values even if each row is unique, a 28b or 29b key is large
> enough to represent each row's value and relative rank compared to all
> of the others even if all row values are unique.
>
> By scanning the table once, we can map say 0000001h (Hex used to ease
> typing) to the row with the minimum value and 1111111h to the row with
> the maximum value as well as mapping everything in between to their
> appropriate keys.  That same scan can be used to assign a pointer to
> each record's location.

But with a single linear scan, this cannot be accomplished, as the table
contents are neither sorted nor distributed linearly between the minimum
and the maximum.
So what?  We are talking about key assignment here, not anything that 
requires physically manipulating the actual DB rows.
One physical IO pass should be all that's needed.

For this mapping, you need a full table sort.
One physical IO pass should be all that's needed.  However, let's 
pretend you are correct and that we do need to sort the table to get 
the key mapping.  Even so, we would only need to do it =once= and 
then we would be able to use and incrementally update the results 
forever afterward.  Even under this assumption, one external sort to 
save all subsequent such sorts seems well worth it.

IOW, even if I'm wrong about the initial cost to do this; it is still 
worth doing ;-)

> That initial scan to set up the keys is expensive, but if we wish that
> cost can be amortized over the life of the table so we don't have to pay
> it all at once.  In addition, once we have created those keys, then can
> be saved for later searches and sorts.

But for every update or insert, you have to resort the keys, which is
_very_ expensive as it basically needs to update a huge part of the table.

??? You do not need to resort already ordered data to insert a new 
element into it such that the data stays ordered!  Once we have done 
the key ordering operation once, we should not ever need to do it 
again on the original data.  Else most sorting algorithms wouldn't work ;-)

Ron