Search Postgresql Archives

Re: database 1.2G, pg_dump 73M?!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ross Boylan wrote:
On Sun, 2008-03-30 at 20:27 +0200, Joris Dobbelsteen wrote:
Ross Boylan wrote:
I have a postgres server for which du reports
1188072	/var/lib/postgresql/8.2/main
on  Linux system.
The server has only one real database, which is for bacula.  When I dump
the database, it's 73Mg.

This is immediately after I did a full vacuum and restarted the server.

Also,
bacula=> SELECT relname, relpages FROM pg_class ORDER BY relpages DESC
limit 15;
relname | relpages ---------------------------------+----------
 file_jpfid_idx                  |    27122
 file_pathid_idx                 |    17969
 file_jobid_idx                  |    17948
 file_pkey                       |    14580
 file_fp_idx                     |    12714
 file                            |    11558
 file_filenameid_idx             |     9806
 filename                        |     3958
 filename_name_idx               |     2510
 filename_pkey                   |     1367
 path                            |      966
 path_name_idx                   |      950
 path_pkey                       |      151
 pg_attribute_relid_attnam_index |       46
 pg_proc                         |       45

It seems very strange to me that there is such a difference in size
between the dump and the database: the data store is almost 15 time
larger than the dump.

Is this to be expected (e.g., from the indices taking up disk space)?
Is there anything I can do to reclaim some disk space
There are a few factors you need to take into account:

    * Data storage in the database is packed into blocks and contains
      header data. Since data needs to be put into blocks there is a
      potential for waisting space. If you are unlucky it can become
      nearly a single row in the worst case.
    * You need to vacuum often, to ensure obsolete rows are removed and
      space can be reused.
    * Tables are not reduced in size and only grown. I thinks cluster
      and vacuum full will reduce the size of your table.
    * Indexes are not in the backup, they are derived from the table
      data on a restore.
      If you remove the indexes you are left with 150~200 MB of data (I
      guessed).
      Doing reindex will rebuild the index and get rid of all the bloat
      it has been collected during use.  _I recommend you try this_, as
      your indexes on the file table look quite huge.
reindexing had a huge effect.  After reindex the top tables (file,
filename and path) I now see
 SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 15;
relname | relpages ---------------------------------+----------
 file                            |    11558
 filename                        |     3958
 filename_name_idx               |     2383
 file_jpfid_idx                  |     2145
 file_fp_idx                     |     1787
 file_jobid_idx                  |     1427
 file_pathid_idx                 |     1427
 file_pkey                       |     1427
 file_filenameid_idx             |     1427
 filename_pkey                   |     1367
 path                            |      966
 path_name_idx                   |      871
 path_pkey                       |      151
 pg_attribute_relid_attnam_index |       46
 pg_proc                         |       45
and du now reports 451M.  That still seems a bit large, given the size
of the sql dump, but it's almost 2/3 lower than it was before.

Thanks so much!

I guess I need to figure out how to reindex automatically.
Take a threshold, e.g. look which indexes are towards the table size, or something. The bloat is mostly causes by continues updates to the indexes on every insert, update and delete command. The index needs to split pages that might be merged back some time later. Doing frequent vacuums might, or might not, prevent this. Even in theory you will see that algorithms allow trees to grow quite large up to a certain constant factor. This is in order to have a good limit on the amount of work that must be done on a operation on the index.
But the most important factor for you will be the following:

    * Backups are compressed. Since you store filenames and paths these
      will have a very high amount of regularity and therefore are very
      good targets for compression. This can save a huge amount of data.
      If you take a compression factor of 50%~70% you will reach your 70 MB.
I don't see how this is relevant, since my dump file was plain text
(sql).
From the top contenders, about half are indexes, so you are stuck with ~200 MB of data in the tables. Postgresql has some wasted space due to placement of the tuples in a block and overhead for each block and row. I don't know those values, but they are in the range of 24 bytes per tuple, I believe. Secondly a block is 8 KB by default and tuples cannot be stored into multiple blocks (thats what toast to work around).

All in all: Lookup tuple sizes, if they are small than the overhead from postgresql can be a big factor. If you are huge you loose on portions of unoccupied space in blocks. I believe pg_statistics will provide this information. Another factor is representation in the SQL dump might be more efficient than in the database, but this highly depends on your data set. For example, a int8 takes 8 bytes in a table, while it takes between 1 and ~20 in a SQL dump.

How the plain SQL dump becomes this small I cannot explain without much much more details.
Hope this helps...

It was a huge help.
Glad it was,

- Joris

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux