Re: rtree/gist index taking enormous amount of space in 8.2.3

"Dolafi, Tom" <dolafit@xxxxxxxxxxxxxxxx> · Fri, 29 Jun 2007 14:05:30 -0400

The data is not distributed well...

Top 20 occurrences of fmin and fmax:
   fmin   | count
----------+--------
        0 | 214476
 19281576 |   2870
  2490005 |   2290
  1266332 |   2261
 15539680 |   2086
 11022233 |   2022
 25559658 |   1923
  3054411 |   1906
 10237885 |   1890
 13827272 |   1876
 19187021 |   1847
 18101335 |   1845
  1518230 |   1843
 21199488 |   1842
  1922518 |   1826
  1216144 |   1798
 25802126 |   1762
  8307335 |   1745
 21271866 |   1736
  8361667 |   1721

   fmax   | count
----------+--------
       25 | 197551
 21272002 |    547
 21271988 |    335
 21271969 |    321
  6045781 |    247
  1339301 |    243
 21669151 |    235
  7779506 |    232
  2571422 |    229
  7715946 |    228
 27421323 |    222
  7048089 |    221
    87364 |    219
 13656535 |    217
 26034147 |    214
 19184612 |    213
  7048451 |    213
 21668877 |    213
  6587492 |    212
  9484598 |    212

Also, out of 5.7 million rows there are 1.6 million unique fmin and 1.6
million unique fmax values.

Thanks,
Tom 

-----Original Message-----
From: Craig James [mailto:craig_james@xxxxxxxxxxxxxx] 
Sent: Friday, June 29, 2007 12:14 PM
To: Dolafi, Tom
Cc: Tom Lane; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re: [PERFORM] rtree/gist index taking enormous amount of space
in 8.2.3

Dolafi, Tom wrote:
> min(fmin) |   max(fmin)    |    avg(fmin)      
>    1      |   55296469     |    11423945     
> 
> min(fmax) |   max(fmax)    |    avg(fmax)
>   18      |   55553288     |    11424491
> 
> There are 5,704,211 rows in the table.

When you're looking for weird index problems, it's more interesting to
know if there are certain numbers that occur a LOT.  From your
statistics above, each number occurs about 10 times in the table.  But
do some particular numbers occur thousands, or even millions, of times?

Here is a query that will print a list of the highest-occuring values.
You might expect a few occurances of 20, and maybe 30, but if you have
thousands or millions of occurances of certain numbers, then that can
screw up an index.

   select fmax, c from
    (select fmax, count(fmax) as c from your_table group by fmax) as foo
   where c > 3 order by c desc;

Craig