Re: Index Size

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Tue, 31 May 2011 14:22:38 +0800

On 30/05/11 20:53, Nick Raj wrote:
> Hi,
> 
> Cube code provided by postgres contrib folder. It uses the NDBOX structure.
> On creating index, it's size increase at a high rate.

Here's what I get on 8.4 with a cleaned up test case. It uses the
original poster's data ( output_vehicle.sql ) and just automates adding
rows in powers of 2 and checking the table size.

(Quoted because it is the only way to stop Thunderbird wrapping text in
plain text editing mode - argh!)

>  nrows | tablesize | totalsize | indexsize | bpr_row | bpr_total | bpr_index 
> -------+-----------+-----------+-----------+---------+-----------+-----------
>      0 |         0 |      8192 |      8192 |         |           |          
>      2 |      8192 |     16384 |      8192 |    4096 |      8192 |      4096
>      4 |      8192 |     16384 |      8192 |    2048 |      4096 |      2048
>      8 |      8192 |     16384 |      8192 |    1024 |      2048 |      1024
>     16 |      8192 |     16384 |      8192 |     512 |      1024 |       512
>     32 |      8192 |     16384 |      8192 |     256 |       512 |       256
>     64 |      8192 |     16384 |      8192 |     128 |       256 |       128
>    128 |     16384 |     65536 |     49152 |     128 |       512 |       384
>    256 |     24576 |    122880 |     98304 |      96 |       480 |       384
>    512 |     49152 |    229376 |    180224 |      96 |       448 |       352
>   1024 |     90112 |    327680 |    237568 |      88 |       320 |       232
>   2048 |    180224 |   1376256 |   1196032 |      88 |       672 |       584
>   4096 |    352256 |   2228224 |   1875968 |      86 |       544 |       458
>   8192 |    696320 |   3751936 |   3055616 |      85 |       458 |       373
>  16384 |   1384448 |  13254656 |  11870208 |      84 |       809 |       724
>  19875 |   1679360 |  15466496 |  13787136 |      84 |       778 |       693
> (16 rows)

As expected, the bytes cost per table row sans index (bpr_row) tends
down slowly toward a stable value. The index size per row (bpr_index) is
all over the place, but seems consistently pretty huge.

At 19875 rows the index is 90% of the total size, or over 8 times the
size of the table its self.

While I realize that a general-purpose index for multi-dimensional
structures like cubes may not be hugely efficient, is this kind of huge
storage consumption expected? The memory hit will be so bad that it's
likely to be faster to use full table scans.

Run the attached test as:

   psql -f test.sql -q regress

where "regress" is the name of the database to do the work in. The
original poster's "output_vehicle.sql" must be in the same directory.

--
Craig Ringer
-- You must inport the `cube' contrib module to use this code

-- DROP TABLE IF EXISTS cubtest;
-- CREATE TABLE cubtest (c cube);
-- COMMENT ON TABLE cubtest IS 'Holds raw cube data import; lacks indexes';
-- \i output_vehicle.sql

CREATE LANGUAGE 'plpgsql';
DROP TABLE IF EXISTS cubtest2, sizes CASCADE;

CREATE TABLE cubtest2 (c cube);
CREATE INDEX t on cubtest2 using gist(c);
COMMENT ON TABLE cubtest2 IS 'Indexed table that is progressively added to during testing';

CREATE TABLE sizes (nrows INTEGER, tablesize INTEGER, totalsize INTEGER, indexsize INTEGER, bpr_row float, bpr_total float, bpr_index float);

CREATE OR REPLACE FUNCTION get_sizes(OUT nrows INTEGER, OUT tablesize_rows INTEGER, OUT tablesize_total INTEGER, OUT tablesize_index INTEGER, OUT bytes_per_row float, OUT bytes_per_row_total float, OUT bytes_per_row_index float) AS $$
BEGIN
	tablesize_rows = pg_relation_size('cubtest2');
	tablesize_total = pg_total_relation_size('cubtest2');
	tablesize_index = tablesize_total - tablesize_rows;
	nrows = (select count(*) from cubtest2);
	IF nrows <> 0 THEN
		bytes_per_row = tablesize_rows/ nrows;
		bytes_per_row_total = tablesize_total / nrows;
		bytes_per_row_index = tablesize_index / nrows;
	END IF;
END;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION size_test() RETURNS SETOF sizes AS $$
DECLARE
	cubtest_rows INTEGER;
	inserted_rows INTEGER;
	o INTEGER;
BEGIN
	INSERT INTO sizes SELECT * FROM get_sizes();
	cubtest_rows = (SELECT count(*) FROM cubtest);
	INSERT INTO cubtest2 SELECT * FROM cubtest OFFSET 0 LIMIT 1;
	cubtest_rows = (SELECT count(*) FROM cubtest);
	inserted_rows = 1;
	WHILE inserted_rows < cubtest_rows LOOP
		INSERT INTO cubtest2 SELECT * FROM cubtest OFFSET inserted_rows LIMIT inserted_rows;
		INSERT INTO sizes SELECT * FROM get_sizes();
		inserted_rows = inserted_rows * 2;
	END LOOP;
	RETURN QUERY SELECT * FROM sizes;
END;
$$ LANGUAGE 'plpgsql';

SELECT * from size_test();
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general