There's also one project called Postgres-XC which seems very promising, but I'm kind of wary about its level of maturity.
On Monday, February 18, 2013 08:00:33 AM Albe Laurenz wrote: > Tiemo Kieft wrote: > > > We are developing an application that uses various web analytics packages > > (like Google Analytics) to run analyses on. We are currently in closed > > beta stadium where we don't have a lot of data in the database, but at > > some point it will grow considerably. > > > > We basically have two different sets of data, on the one hand we have raw > > metrics from the datasource, and on the other hand we have account and > > meta information. The former can be re-downloaded at any time, and will > > grow to quite large sizes. The latter set is the one that we really care > > about, and don't want to risk losing. > > > > Currently we plan on using streaming replication to replicate all data to > > at least one slave, for the near future this will do, since we can run > > some of the large (read-only) aggregation queries on the slave database. > > In the future the dataset might grow to the point where we need to start > > thinking about sharding. The analytics data can be sharded on a > > per-customer basis, and doesn't have to be replicated. > > > > Since Postgres doesn't support per-table streaming replication (as far as > > I can tell), the only solution would be to run two separate instances of > > postgres per server. One instance is replicated to all servers, and will > > contain account and other important information. The other instance is > > used to store analytics data. Is this a viable way of solving this > > problem, or are we overlooking something? > > The problem is not really immediate, as the dataset is currently small > > enough to fit on one machine (and replicated to a second), just want to > > be future proof, and get this solved before the problems start. > > > The problems I see with distributing your data across > several PostgreSQL clusters is that they become disconnected. > > It will become much more difficult to keep them consistent: > > You cannot have referential integrity, and if a database restore > is needed, you have to make extra provisions that you can restore > your system to a consistent state across all clusters. > > You also lose the ability to join between tables that are > distributed across different databases. This can be a > perdormance problem, particularly in an OLAP scenario. > > That's all I can think of at the moment. > > Yours, > Laurenz Albe
René Romero Benavides @iCodeiExist @PgsqlMx
Postgresql Tips en español para la comunidad de México e Hispanoamérica.
|