Search Postgresql Archives

Re: Streaming replication and sharding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There's also one project called Postgres-XC which seems very promising, but I'm kind of wary about its level of maturity.

 

On Monday, February 18, 2013 08:00:33 AM Albe Laurenz wrote:

> Tiemo Kieft wrote:

>

> > We are developing an application that uses various web analytics packages

> > (like Google Analytics) to

run analyses on. We are currently in closed

> > beta stadium where we don't have a lot of data in the database, but at

> > some point it will grow considerably.

> >

> > We basically have two different sets of data, on the one hand we have raw

> > metrics from the datasource,

and on the other hand we have account and

> > meta information. The former can be re-downloaded at any time, and will

> > grow to quite large sizes. The latter set is the one that we really care

> > about, and don't want to risk losing.

> >

> > Currently we plan on using streaming replication to replicate all data to

> > at least one slave, for the

near future this will do, since we can run

> > some of the large (read-only) aggregation queries on the slave database.

> > In the future the dataset might grow to the point where we need to start

> > thinking about sharding. The analytics data can be sharded on a

> > per-customer basis, and doesn't have to be replicated.

> >

> > Since Postgres doesn't support per-table streaming replication (as far as

> > I can tell), the only

solution would be to run two separate instances of

> > postgres per server. One instance is replicated to all servers, and will

> > contain account and other important information. The other instance is

> > used to store analytics data. Is this a viable way of solving this

> > problem, or are we overlooking something?

> > The problem is not really immediate, as the dataset is currently small

> > enough to fit on one machine

(and replicated to a second), just want to

> > be future proof, and get this solved before the problems start.

>

>

> The problems I see with distributing your data across

> several PostgreSQL clusters is that they become disconnected.

>

> It will become much more difficult to keep them consistent:

>

> You cannot have referential integrity, and if a database restore

> is needed, you have to make extra provisions that you can restore

> your system to a consistent state across all clusters.

>

> You also lose the ability to join between tables that are

> distributed across different databases. This can be a

> perdormance problem, particularly in an OLAP scenario.

>

> That's all I can think of at the moment.

>

> Yours,

> Laurenz Albe


--

René Romero Benavides @iCodeiExist @PgsqlMx

 

Postgresql Tips en español para la comunidad de México e Hispanoamérica.

http://postgresql.org.mx

 

 

 

 

 

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux