>-----Original Message----- >From: Ross Boylan [mailto:RossBoylan@xxxxxxxxxxxxxxxxxx] >Sent: Monday, 31 March 2008 0:23 >To: Joris Dobbelsteen >Cc: RossBoylan@xxxxxxxxxxxxxxxxxx; pgsql-general@xxxxxxxxxxxxxx >Subject: RE: database 1.2G, pg_dump 73M?! > > >On Sun, 2008-03-30 at 22:59 +0100, Joris Dobbelsteen wrote: >> >-----Original Message----- >> >From: Ross Boylan [mailto:RossBoylan@xxxxxxxxxxxxxxxxxx] >> >Sent: Sunday, 30 March 2008 23:43 >> >To: Joris Dobbelsteen >> >Cc: RossBoylan@xxxxxxxxxxxxxxxxxx; pgsql-general@xxxxxxxxxxxxxx >> >Subject: Re: database 1.2G, pg_dump 73M?! >> > >> >On Sun, 2008-03-30 at 21:22 +0200, Joris Dobbelsteen wrote: >> >> From the top contenders, about half are indexes, so you >are stuck >> >> with ~200 MB of data in the tables. >> >> Postgresql has some wasted space due to placement of the >tuples in >> >> a block and overhead for each block and row. I don't know >> >those values, >> >> but they are in the range of 24 bytes per tuple, I believe. >> >Secondly a >> >> block is 8 KB by default and tuples cannot be stored into >multiple >> >> blocks (thats what toast to work around). >> >> >> >> All in all: Lookup tuple sizes, if they are small than >the overhead >> >> from postgresql can be a big factor. If you are huge you loose on >> >> portions of unoccupied space in blocks. I believe pg_statistics >> >> will provide this information. >> >There is a pg_statistic (no "s") table, but I don't know >how to get >> >tuple size from it--the documentation refers to the source code to >> >figure out the codes. Backing up a step, I don't know what a tuple >> >is in Postgres, and don't see an entry for it in the index. >> >> It was pg_stats. >> You get avg_width. It gives this per column. >> >> So probably you want >> SELECT tablename, SUM(avg_width) >> FROM pg_stats >> WHERE schemaname = 'public' >[Thanks; I didn't know about using schemaname to limit it to >interesting tables] >> GROUP BY tablename [RB added] ORDER BY tablename; > tablename | sum >-----------+----- > client | 62 > counters | 25 > file | 109 > filename | 18 > fileset | 53 > job | 149 > jobmedia | 52 > media | 226 > mediatype | 16 > path | 82 > pool | 179 > status | 29 > storage | 23 > version | 4 > >So, for example, if each tuple has 24 bytes of overhead, the >overhead more than doubles the size of the file table (18 >bytes), which has a big record count. Am I following correctly? Yes. (Note not to pin down on the 24 bytes, it varies between versions. I think this is close enough however). >Between the space taken up by indices and the other overhead, >the size difference between the sql dump and the db disk >useage is starting to seem more reasonable. > >The database uses SQL-ASCII encoding, so I'd expect the >textual parts (filenames and paths) to take up the same space >(just for the basic storage, not counting overhead/indices) in >Postgres as on the dump, 1 byte/character. That's the idea. >I'm not sure what "If you are huge you loose on portions of >unoccupied space in blocks" means. A tuple has to be in exactly 1 page (or block). It cannot span multiple pages. Hence if your tuples happens to be a half page size or slightly larger, than only a single one will fit in a single page. So you waste half the page. You don't have to worry about this, as your tuples are significantly smaller than that. (As a reference only: if a tuple is larger than a blocksize some attributes will be evicted to the toast table or compressed. So that is to overcome the limitations of 8000 bytes per tuple.) [snip] >> >> ># select distinct relname, reltuples, relpages from pg_class where >> >relkind='r' and substring(relname, 1, 3) != 'pg_'; >> > >> > relname | reltuples | relpages >> >-------------------------+-----------+---------- >> > basefiles | 0 | 0 >> > cdimages | 0 | 0 >> > client | 2 | 1 >> > counters | 1 | 1 >> > device | 0 | 0 >> > file | 650659 | 11558 >> > filename | 623012 | 3958 >> > fileset | 22 | 1 >> > job | 384 | 10 >> > jobmedia | 596 | 7 >> > location | 0 | 0 >> > locationlog | 0 | 0 >> > log | 0 | 0 >> > media | 245 | 9 >> > mediatype | 2 | 1 >> > path | 67908 | 966 >> > pool | 5 | 1 >> > sql_features | 439 | 6 >> > sql_implementation_info | 12 | 1 >> > sql_languages | 4 | 1 >> > sql_packages | 10 | 1 >> > sql_parts | 9 | 1 >> > sql_sizing | 23 | 1 >> > sql_sizing_profiles | 0 | 0 >> > status | 19 | 1 >> > storage | 2 | 1 >> > unsavedfiles | 0 | 0 >> > version | 1 | 1 >> >> Do relpages * 8096 / reltuples. >> Default installs have 8 KB pages/blocks by default. >> >> For file it should be ~144 bytes/tuple. >> For filename it should be ~51 bytes/tuple. > >Those tuple size estimates seem consistent with the tuple >sizes reported earlier when the overhead is added in. I'm impressed! > [snip] - Joris -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general