Re: Sequential vs. random values - number of pages in B-tree

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Fri, 19 Aug 2016 19:02:59 +0200

On Fri, Aug 19, 2016 at 3:20 PM, Daniel Verite <daniel@xxxxxxxxxxxxxxxx> wrote:
> There's a simple technique that works on top of a Feistel network,
> called the cycle-walking cipher. Described for instance at:
> http://web.cs.ucdavis.edu/~rogaway/papers/subset.pdf
> I'm using the opportunity to add a wiki page:
> https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range
> with sample plgsql code for the [0..10,000,000] range that might be useful
> for other cases.

Nice reference, nice WikiPage, nice job. Bookmarking every thing.

> But for the btree fragmentation and final size issue, TBH I don't expect
> that constraining the values within that smaller range will make any
> difference
> in the tests, because it's the dispersion that matters, not the values
> themselves.

Neither do I, that is why I stated probabley good enough for tests,  but....

> I mean that, whether the values are well dispersed in the [0..1e7] range or
> equally well dispersed in the [0..2**32] range, the probability of a newly
> inserted value to compare greater or lower to any previous values of the list
> should be the same, so shouldn't the page splits be the same, statistically
> speaking?

I know some btrees do prefix-coding of the keys, and some do it even
with long integers, and some others do delta-coding for integer keys.
I seem to recall postgres does not, that's whay I do not think it will
make a difference.

But anyway, to compare two things like that, as the original poster
was doing, I normally prefer to test just one thing at a time, that's
why I would normally try to do it by writing a sorted file, shuffling
it with sort -R, and copying it, server side if posible, to eliminate
so both

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general