Need advice for handling big data in postgres

Tobias Fielitz <tobias@xxxxxxxxxxxxxx> · Wed, 14 Jan 2015 11:42:45 +1100

Hi,

I am quite new to handle a lot of data in Postgres and I would be happy
to get as much advice from experienced data jongleurs as possible.

THE SCENARIO:
I have a big table `client_log`. It is the "main gate" for clients (and
there is a lot of them) to talk to the backend.
Each client sends more or less log lines, resulting in the table being
written lots of times every second producing a lot of data. Here is the
table definition:
http://stackoverflow.com/questions/25542010/postgres-huge-table-with-delayed-read-and-write-access
*code* determines the nature of the log line. Example: code=20
determines a location log line (the important information will be in
*latitude* and *longitude*), code=8999 carries a comment that just needs
to get stored (in the *comment* field).

Just storing the data is useless, I need to look at it. At some log
lines quite regularly, at some just once, at others every now and then.
I need indexes on nearly every single column for quick access. So what
is the `best` way of storing the data?

OPTION 1 - PARTITIONING:
For each query only a few columns are interesting and I could partition
the table (as it was suggested on SO)
by *created* and by *code*.
There is roughly 10 different codes and I would keep data for the last
two months (partitioned by day). So I would end up having 10 * 60 = 600
partitions.
For every partition I could create indexes that are relevant to that
partition (examples: *created*, *created_on_server* or *latitude* and
*longitude*).

OPTION 2 - MULTIPLE TABLES:
I could create the tables myself: one for location log lines, one for
comment log lines etc. and store them via python in the correct table
(depending on *code*). Each of these tables would only have the columns
and indexes needed.

OUTCOME:
I expect partitioning to be faster because Postgres selects the correct
partition for me automatically. I can easily get rid of old data by
dropping the corresponding partition. The downside of the partition
approach is that all partitions inherit all columns of the master table
which is unnecessary (and consumes disc space?).

Which option should I go for?
Is there anything else that I haven't considered?

Any help and comments appreciated.

PS.: Does partitioning work trouble-free with master-slave replication?
begin:vcard
fn:Tobias Fielitz
n:Fielitz;Tobias
adr:;;90 Pitt St;Sydney;NSW;2000;Australia
email;internet:tobias@xxxxxxxxxxxxxx
title:StreetHawk
tel;cell:0404267511
note:Skype: tobias.fielitz
url:streethawk.com
version:2.1
end:vcard

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general