Re: [HACKERS] qsort again (was Re: Strange Create

Markus Schaber <schabi@xxxxxxxxxxxx> · Fri, 17 Feb 2006 11:19:45 +0100

Hi, Ron,

Ron schrieb:

> OK, so here's _a_ way (there are others) to obtain a mapping such that
>  if a < b then f(a) < f (b) and
>  if a == b then f(a) == f(b)
> 
> Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
> integer; a 4KB row becomes a 32Kb integer; etc)
> Since even a 1TB table made of such rows can only have 256M - 512M
> possible values even if each row is unique, a 28b or 29b key is large
> enough to represent each row's value and relative rank compared to all
> of the others even if all row values are unique.
> 
> By scanning the table once, we can map say 0000001h (Hex used to ease
> typing) to the row with the minimum value and 1111111h to the row with
> the maximum value as well as mapping everything in between to their
> appropriate keys.  That same scan can be used to assign a pointer to
> each record's location.

But with a single linear scan, this cannot be accomplished, as the table
contents are neither sorted nor distributed linearly between the minimum
and the maximum.

For this mapping, you need a full table sort.

> That initial scan to set up the keys is expensive, but if we wish that
> cost can be amortized over the life of the table so we don't have to pay
> it all at once.  In addition, once we have created those keys, then can
> be saved for later searches and sorts.

But for every update or insert, you have to resort the keys, which is
_very_ expensive as it basically needs to update a huge part of the table.

Markus