All, * FarjadFarid(ChkNet) (farjad.farid@xxxxxxxxxxxxxxxxx) wrote: > Tom, thanks for your unbiased detailed response. > > Interesting post. Please don't top-post. My comments are in-line, below. > -----Original Message----- > From: pgsql-general-owner@xxxxxxxxxxxxxx > [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Tom Lane > Sent: 22 February 2016 05:06 > To: Andrew Smith > Cc: pgsql-general@xxxxxxxxxxxxxx > Subject: Re: Why is my database so big? > > Andrew Smith <laconical@xxxxxxxxx> writes: > > I am setting up a proof of concept database to store some historical data. > > Whilst I've used PostgreSQL a bit in the past this is the first time > > I've looked into disk usage due to the amount of data that could > > potentially be stored. I've done a quick test and I'm a little > > confused as to why it is occupying so much space on disk. Here is my table > definition: > > > CREATE TABLE "TestSize" > > ( > > "Id" integer NOT NULL, > > "Time" timestamp without time zone NOT NULL, > > "Value" real NOT NULL, > > "Status" smallint NOT NULL, > > PRIMARY KEY ("Id", "Time") > > ); > > > CREATE INDEX test_index ON "TestSize" ("Id"); Note that you don't really need an index on "Id" because including a primary key will automatically include an index on those fields, and an index over ("Id", "Time") can be used to satisfy queries which have a conditional on just the "Id" column. Removing that extra index will likely help with space issues. > > With a completely empty table the database is 7 MB. After I insert 1 > > million records into the table the database is 121 MB. My > > understanding is that each of the fields is sized as follows: > > > integer - 4 bytes > > timestamp without time zone - 8 bytes > > real - 4 bytes > > smallint - 2 bytes I'd recommend against using timestamp w/o time zone. For starters, as noted, it's not actually saving you any space over timestamp w/ time zone, and second, it makes working with that field painful and prone to error. > The long and the short of it is that Postgres is more oriented to OLTP-style > applications where access to and update of individual rows is the key > performance metric. If you're dealing with historical data and mainly want > aggregated query results, it's possible you'd get better performance and > more-compact storage from a column-store database. > > There's ongoing investigation into extending Postgres to support > column-style storage for better support of applications like that; but any > such feature is probably several years away, and it will not come without > performance compromises of its own. One approach to dealing with the PG per-row overhead is to use arrays instead of rows, when you don't need that per-row visibility information. I've found that to be very successful for single-column tables, but the technique may work reasonably well for other structures also. Thanks! Stephen
Attachment:
signature.asc
Description: Digital signature