On 26/07/12 04:09, McGehee, Robert wrote:
Very interesting points. Thanks for the documentation link and the point about alignment.
As a test of Tom's suggestion to group smallints together to avoid alignment problems, I changed the column order from
smallint, date, smallint, integer, real TO
smallint, smallint, date, integer, real
This resulted in a "Small" table that is 15% smaller than the original "Small" table (414MB vs 487MB). I wasn't aware that tables could be optimized by switching column order like this. This could be a good note to make in either the "Data Types" portion of the PostgreSQL manual or the "Performance Optimization" portion of the PostgreSQL Wiki.
One might even imagine a future version of PostgreSQL using an efficient disk layout that may not match the table layout in order to avoid wasted space from padding. I suppose this already happens to some extent with the different storage modes (plain, extended, external).
Steve also correctly pointed out that my text string probably only takes up 3-4 bytes rather than 7 bytes. Meaning the "Small" table uses only 7-8 bytes/row less than the "Big" table. For 10M rows, the expected savings should be about 70-80MB. This matches the 79MB of savings I see once I reorder the columns following Tom's suggestion. All is right in the world!
Thanks to all,
Robert
SC> More questions than answers:
SC> What version of PostgreSQL?
9.1.1 (I should have mentioned that)
SC> How are your determining the space used by a table?
\d+
SC> Why are you assuming 7 bytes for a 3-character value? (Character values
SC> up to 126 characters long only have 1-character overhead.)
You are correct. I read the wrong documentation (docs for 8.2 says overhead is 4 bytes, but docs for my 9.1 version says overhead is only one byte for small strings). So I believe I should assume only 4 bytes of total usage here, maybe less if it's compressed.
SC> What is the fill-factor on the tables? (Should default to 100% but don't
SC> know how you are configured.)
I'm using 100%, but it shouldn't matter as I was giving sizes without the index anyway.
SC> Do the tables have OIDs or not?
No
-----Original Message-----
From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx]
Sent: Tuesday, July 24, 2012 10:00 PM
To: Adrian Klaver
Cc: McGehee, Robert; pgsql-general@xxxxxxxxxxxxxx
Subject: Re: Smaller data types use same disk space
Adrian Klaver <adrian.klaver@xxxxxxxxx> writes:
On 07/24/2012 03:21 PM, McGehee, Robert wrote:
I've created two tables labeled "Big" and "Small" that both store the same 10 million rows of data using 493MB and 487MB of disk space respectively. The difference is that the "Big" table uses data types that take up more space (integer rather than smallint, float rather than real, etc). The "Big" table should need about 27 bytes/row versus 16 bytes/row for the "Small" table, indicating to me that the "Big" table should be 70% bigger in actual disk size. In reality, it's only 1% bigger or 6MB (after clustering, vacuuming and analyzing). Why is this? Shouldn't the "Small" table be about 110MB smaller (11 bytes for 10 million rows)? I'm estimating table size with \d+
See here for the gory details:
http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html
See in particular:
Table 55-4. HeapTupleHeaderData Layout
From the text:
""All table rows are structured in the same way. There is a fixed-size
header (occupying 23 bytes on most machines.." which breaks you
assumption of the Big/Small row size comparison.
Aside from the tuple header, there are alignment considerations that you
have to allow for. So for instance, if your columns are int, smallint,
int, that will take the same amount of space as 3 ints, because the
savings disappears into alignment of the third int. You need two
adjacent smallints to get any benefit.
regards, tom lane
Curious, does changing the order to:
date, real, integer, smallint, smallint
make any difference?
More specifically, putting the types into decreasing order of the size
of alignment requirements - i.e. putting types which require 8 byte
alignment before those requiring 4 byte alignment...
Cheers,
Gavin
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general