Hello,
I have two large tables in a database, one containing original data and
the other one derived from the first table. The first table contains
several columns and indexes, while the second table has less columns and
only one index. Both tables have the same number of rows. Nevertheless,
the second table is much larger in disk size than the first one. How can
this be explained?
The tables are defined below. The first table 'measurement' contains
information about measurements by different devices on different times.
The second table 'stats_count' is created in order to be able to quickly
query the number of measurements for a particular device in an arbitrary
time interval. To accomplish this the second table contains for each
measurement a cumulative count of the number of measurements up until
that time. The table is kept synchronised with the first table with
triggers.
The first table contains 4 integer columns, one enum column and one time
column, which would normally take 28 bytes for each row. The table
contains about 50 million rows. This means about 1.5 gigabyte of data.
The actual disk size of this table (without indexes) is 3 gigabyte,
which is propably explainable because of some overhead. The disk size
with indexes is about 9 gigabyte.
The second table on the other hand only contains 3 integer columns and
one time column. Nevertheless the size on disk of this table is about 14
gigabyte without indexes. There is one index on this table which uses an
extra 4 gigabyte. Can anyone explain this?
One possible reason is that the primary key on table measurement is an
integer, while the primary key on table stats_count is a combination of
two integers and a time. Logically it requires more disk space to store
the second key, however the same two integers and time also form an
index in the first table, so should take the same amount of disk space
on the first table. Or does a primary key take more disk space than
another index or unique constraint? Would it therefore be more efficient
(=less disk space) to use an extra id column in the table stats_count as
primary key and the combination (deviceid, type, time) as a regular
index? And if so, will a query be still as fast (I can imagine that if
an index uses less disk space, it will not be as fast).
Another possible reason could be the foreign key that is present in the
table stats_count that references the combination (deviceid, type, time)
in table measurements. Does this foreign key take that much disk space?
Any help would be greatly appreciated,
Kind regards,
Rik
CREATE TABLE measurement
(
id serial NOT NULL,
deviceid integer NOT NULL,
"time" timestamp without time zone NOT NULL,
javaobjectid integer NOT NULL,
javaobjectsubid integer NOT NULL,
"type" measurementtype NOT NULL,
CONSTRAINT measurement_pkey PRIMARY KEY (id),
CONSTRAINT measurement_deviceid_fkey FOREIGN KEY (deviceid)
REFERENCES device (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT measurement_javaobject_unique UNIQUE (javaobjectid,
javaobjectsubid),
CONSTRAINT measurement_unique UNIQUE (type, deviceid, "time")
)
CREATE INDEX measurement_javaobjectsubid_index
ON measurement
USING btree
(javaobjectsubid);
CREATE TABLE stats_count
(
deviceid integer NOT NULL,
"time" timestamp without time zone NOT NULL,
"type" measurementtype NOT NULL,
cum_count integer,
CONSTRAINT stats_count_pkey PRIMARY KEY (type, deviceid, "time"),
CONSTRAINT stats_count_type_fkey FOREIGN KEY ("type", deviceid, "time")
REFERENCES measurement ("type", deviceid, "time") MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general