I have a large table in Postgres 9.4.4 called 'observation', which is by far the largest item in the database. When I pg_dump this table the resulting file is about 9GB. However I've noticed that the pg data directory uses about 30GB (excluding pg_xlog).
Looking at the space usage:
-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB
-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB
-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB
-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%
-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes
Looking at the space usage:
-- Size of 'warehouse' database:
select pg_size_pretty(pg_database_size('warehouse'));
-- 29GB
-- Total space used by observation table including indexes:
select pg_size_pretty(pg_total_relation_size('observation'));
-- 29GB
-- Excluding indexes:
select pg_size_pretty(pg_relation_size('observation'));
-- 20GB
-- Percentage of dead tuples:
select pg_stat_get_dead_tuples('observation'::regclass)::numeric * 100 / pg_stat_get_live_tuples('observation'::regclass);
-- 13%
-- Average length of a row in bytes:
select avg(octet_length(t.*::text)) FROM observation t;
-- 287 bytes
-- Number of rows * average size of row:
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB
select pg_size_pretty(count(obs_id) * 287) from observation;
-- 9.4 GB
If the live tuples take 9.4GB, then dead ones take 1.2GB = 10.6GB total.
What accounts for the remaining 9.4GB? (20GB - 10.6GB)
Steve Pritchard
British Trust for Ornithology