Tiemo Kieft wrote: > We are developing an application that uses various web analytics packages (like Google Analytics) to > run analyses on. We are currently in closed beta stadium where we don't have a lot of data in the > database, but at some point it will grow considerably. > > We basically have two different sets of data, on the one hand we have raw metrics from the datasource, > and on the other hand we have account and meta information. The former can be re-downloaded at any > time, and will grow to quite large sizes. The latter set is the one that we really care about, and > don't want to risk losing. > > Currently we plan on using streaming replication to replicate all data to at least one slave, for the > near future this will do, since we can run some of the large (read-only) aggregation queries on the > slave database. In the future the dataset might grow to the point where we need to start thinking > about sharding. The analytics data can be sharded on a per-customer basis, and doesn't have to be > replicated. > > Since Postgres doesn't support per-table streaming replication (as far as I can tell), the only > solution would be to run two separate instances of postgres per server. One instance is replicated to > all servers, and will contain account and other important information. The other instance is used to > store analytics data. Is this a viable way of solving this problem, or are we overlooking something? > > The problem is not really immediate, as the dataset is currently small enough to fit on one machine > (and replicated to a second), just want to be future proof, and get this solved before the problems > start. The problems I see with distributing your data across several PostgreSQL clusters is that they become disconnected. It will become much more difficult to keep them consistent: You cannot have referential integrity, and if a database restore is needed, you have to make extra provisions that you can restore your system to a consistent state across all clusters. You also lose the ability to join between tables that are distributed across different databases. This can be a perdormance problem, particularly in an OLAP scenario. That's all I can think of at the moment. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general