Search Postgresql Archives

Re: database 1.2G, pg_dump 73M?!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Ross Boylan [mailto:RossBoylan@xxxxxxxxxxxxxxxxxx] 
>Sent: Sunday, 30 March 2008 23:43
>To: Joris Dobbelsteen
>Cc: RossBoylan@xxxxxxxxxxxxxxxxxx; pgsql-general@xxxxxxxxxxxxxx
>Subject: Re:  database 1.2G, pg_dump 73M?!
>
>On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote:
>>  From the top contenders, about half are indexes, so you are stuck 
>> with ~200 MB of data in the tables.
>> Postgresql has some wasted space due to placement of the tuples in a 
>> block and overhead for each block and row. I don't know 
>those values, 
>> but they are in the range of 24 bytes per tuple, I believe. 
>Secondly a 
>> block is 8 KB by default and tuples cannot be stored into multiple 
>> blocks (thats what toast to work around).
>> 
>> All in all: Lookup tuple sizes, if they are small than the overhead 
>> from postgresql can be a big factor. If you are huge you loose on 
>> portions of unoccupied space in blocks. I believe pg_statistics will 
>> provide this information.
>There is a pg_statistic (no "s")  table, but I don't know how 
>to get tuple size from it--the documentation refers to the 
>source code to figure out the codes.  Backing up a step, I 
>don't know what a tuple is in Postgres, and don't see an entry 
>for it in the index.

It was pg_stats.
You get avg_width. It gives this per column.

So probably you want
SELECT tablename, SUM(avg_width)
FROM pg_stats
WHERE schemaname = 'public'
GROUP BY tablename;

>Is a tuple just a row?  That's what the docs say for the following
>report:

Yes.
Also where I typed "block" it will refer to "page".

># select distinct relname, reltuples, relpages from pg_class 
>where relkind='r' and substring(relname, 1, 3) != 'pg_';
>
>         relname         | reltuples | relpages 
>-------------------------+-----------+----------
> basefiles               |         0 |        0
> cdimages                |         0 |        0
> client                  |         2 |        1
> counters                |         1 |        1
> device                  |         0 |        0
> file                    |    650659 |    11558
> filename                |    623012 |     3958
> fileset                 |        22 |        1
> job                     |       384 |       10
> jobmedia                |       596 |        7
> location                |         0 |        0
> locationlog             |         0 |        0
> log                     |         0 |        0
> media                   |       245 |        9
> mediatype               |         2 |        1
> path                    |     67908 |      966
> pool                    |         5 |        1
> sql_features            |       439 |        6
> sql_implementation_info |        12 |        1
> sql_languages           |         4 |        1
> sql_packages            |        10 |        1
> sql_parts               |         9 |        1
> sql_sizing              |        23 |        1
> sql_sizing_profiles     |         0 |        0
> status                  |        19 |        1
> storage                 |         2 |        1
> unsavedfiles            |         0 |        0
> version                 |         1 |        1

Do relpages * 8096 / reltuples.
Default installs have 8 KB pages/blocks by default.

For file it should be ~144 bytes/tuple.
For filename it should be ~51 bytes/tuple.

Probably you will get some signficant differences here.

>> Another factor is representation in the SQL dump might be more 
>> efficient than in the database, but this highly depends on your data 
>> set. For example, a int8 takes 8 bytes in a table, while it takes 
>> between 1 and ~20 in a SQL dump.
>> 
>> How the plain SQL dump becomes this small I cannot explain without 
>> much much more details.
>
>On Tom's point, bacula regularly inserts entries into the 
>tables and then, days to months later, deletes them.  As far 
>as I know, the VACUUM FULLs I just did were the first ever; I 
>did do several of them because I kept getting messages about 
>needing more fsm_pages.
>
>I am still trying to figure out if the database was getting 
>any automatic vacuuming at all.  The Postgres documentation 
>(the database is 8.2, though I'm moving to 8.3 soon) sounds as 
>if it's on automatically, but the Debian-specific 
>documentation suggests I may need to do some additional things 
>to enable it.
>
>Probably the fsm_pages being low also hurt disk useage, since 
>the message accompanying the vacuum said that's what happens 
>if fsm_pages is low.  It was 20k; vacuum said I needed 56k, 
>and I upped it to 80k.
>I'm not sure if my recent cleaning has brought the needed 
>fsm_pages down.
>
>I've only been doing partial backups for the last few months, 
>so there's probably less info in the tables than under normal 
>conditions.  I suppose it's possible the space I gained was 
>just a temporary win.

Looks like some configuration changes are needed to tune your
installation to better suite you needs. fsm_pages can be modified, but I
don't know if any other parameters might need change for that.
Documentation can help and probably a lot of people here can do and tell
it blindfolded.

- Joris

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux