>-----Original Message----- >From: Ross Boylan [mailto:RossBoylan@xxxxxxxxxxxxxxxxxx] >Sent: Sunday, 30 March 2008 23:43 >To: Joris Dobbelsteen >Cc: RossBoylan@xxxxxxxxxxxxxxxxxx; pgsql-general@xxxxxxxxxxxxxx >Subject: Re: database 1.2G, pg_dump 73M?! > >On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote: >> From the top contenders, about half are indexes, so you are stuck >> with ~200 MB of data in the tables. >> Postgresql has some wasted space due to placement of the tuples in a >> block and overhead for each block and row. I don't know >those values, >> but they are in the range of 24 bytes per tuple, I believe. >Secondly a >> block is 8 KB by default and tuples cannot be stored into multiple >> blocks (thats what toast to work around). >> >> All in all: Lookup tuple sizes, if they are small than the overhead >> from postgresql can be a big factor. If you are huge you loose on >> portions of unoccupied space in blocks. I believe pg_statistics will >> provide this information. >There is a pg_statistic (no "s") table, but I don't know how >to get tuple size from it--the documentation refers to the >source code to figure out the codes. Backing up a step, I >don't know what a tuple is in Postgres, and don't see an entry >for it in the index. It was pg_stats. You get avg_width. It gives this per column. So probably you want SELECT tablename, SUM(avg_width) FROM pg_stats WHERE schemaname = 'public' GROUP BY tablename; >Is a tuple just a row? That's what the docs say for the following >report: Yes. Also where I typed "block" it will refer to "page". ># select distinct relname, reltuples, relpages from pg_class >where relkind='r' and substring(relname, 1, 3) != 'pg_'; > > relname | reltuples | relpages >-------------------------+-----------+---------- > basefiles | 0 | 0 > cdimages | 0 | 0 > client | 2 | 1 > counters | 1 | 1 > device | 0 | 0 > file | 650659 | 11558 > filename | 623012 | 3958 > fileset | 22 | 1 > job | 384 | 10 > jobmedia | 596 | 7 > location | 0 | 0 > locationlog | 0 | 0 > log | 0 | 0 > media | 245 | 9 > mediatype | 2 | 1 > path | 67908 | 966 > pool | 5 | 1 > sql_features | 439 | 6 > sql_implementation_info | 12 | 1 > sql_languages | 4 | 1 > sql_packages | 10 | 1 > sql_parts | 9 | 1 > sql_sizing | 23 | 1 > sql_sizing_profiles | 0 | 0 > status | 19 | 1 > storage | 2 | 1 > unsavedfiles | 0 | 0 > version | 1 | 1 Do relpages * 8096 / reltuples. Default installs have 8 KB pages/blocks by default. For file it should be ~144 bytes/tuple. For filename it should be ~51 bytes/tuple. Probably you will get some signficant differences here. >> Another factor is representation in the SQL dump might be more >> efficient than in the database, but this highly depends on your data >> set. For example, a int8 takes 8 bytes in a table, while it takes >> between 1 and ~20 in a SQL dump. >> >> How the plain SQL dump becomes this small I cannot explain without >> much much more details. > >On Tom's point, bacula regularly inserts entries into the >tables and then, days to months later, deletes them. As far >as I know, the VACUUM FULLs I just did were the first ever; I >did do several of them because I kept getting messages about >needing more fsm_pages. > >I am still trying to figure out if the database was getting >any automatic vacuuming at all. The Postgres documentation >(the database is 8.2, though I'm moving to 8.3 soon) sounds as >if it's on automatically, but the Debian-specific >documentation suggests I may need to do some additional things >to enable it. > >Probably the fsm_pages being low also hurt disk useage, since >the message accompanying the vacuum said that's what happens >if fsm_pages is low. It was 20k; vacuum said I needed 56k, >and I upped it to 80k. >I'm not sure if my recent cleaning has brought the needed >fsm_pages down. > >I've only been doing partial backups for the last few months, >so there's probably less info in the tables than under normal >conditions. I suppose it's possible the space I gained was >just a temporary win. Looks like some configuration changes are needed to tune your installation to better suite you needs. fsm_pages can be modified, but I don't know if any other parameters might need change for that. Documentation can help and probably a lot of people here can do and tell it blindfolded. - Joris -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general