Search Postgresql Archives

Re: big database resulting in small dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/21/2012 02:05 AM, Ilya Ivanov wrote:
I have a 8.4 database (installed on ubuntu 10.04 x86_64). It holds Zabbix database. The database on disk takes 10Gb. SQL dump takes only 2Gb. I've gone through http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php and got some hints. Naturally, the biggest table is history (the second biggest is history_uint. Together they make about 95% of total size). I've tried to perform CLUSTER on it, but seemed to be taking forever (3 hours and still not completed). So I cancelled it and went with database drop and restore. It resulted in database taking up 6.4Gb instead of 10Gb. This is a good improvement, but still isn't quite what I expect. I would appreciate some clarification.

To elaborate on the answers already posted:

Plain text dumps only contain the data its self. In many databases the table contents are a small part of the overall database size. Additionally, data is stored on disk in a structure optimised for speed of access, not disk space consumption, so the same data can be much more compact in the dump format.

Finally, data in dumps is much, much more efficiently compressed than it is in the tables. In tables the main rows aren't compressed at all, and TOASTed values like big text fields are individually compressed, which is immensely less space efficient than compressing them all together. On the other hand, it allows random access where the dump format just doesn't - kind of important for a database!


The rest of the space is used by:

- Indexes, which can get quite big.

- Free space in tables from deleted rows that haven't yet been replaced by a new inserted row.
  If you have a non-default FILLFACTOR there can be lots of this.

- "bloat" - wasted space in tables and indexes, typically caused by insufficiently frequent autovacuum

- ... probably more I've forgotten


When you dump and reload you not only get rid of any bloat in your tables and indexes, but you effectively REINDEX your database. PostgreSQL can often create much more compact and efficient index structures when it does a CREATE INDEX on a full table (like when restoring a dump) than when it does a CREATE INDEX on an empty table followed by lots of inserts.

--
Craig Ringer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux