Re: Replication

Gerry Reno <greno@xxxxxxxxxxx> · Mon, 22 Jun 2009 21:29:22 -0400

Craig Ringer wrote:

  On Mon, 2009-06-22 at 20:48 -0400, Gerry Reno wrote:

      Anyway, you seem to be unaware that built-in replication for 
PostgreSQL already is moving along, with an implementation that's just 
not quite production quality yet, and might make into the next version 
after 8.4 if things go well. 

    No, I'm aware of this basic builtin replication. It was rather 
disappointing to see it moved out of the 8.4 release. We need something 
more that just basic master-slave replication which is all this simple 
builtin replication will provide. We need a real replication solution 
that can handle statement-based and row-based replication. Multi-master 
replication. Full cyclic replication chain setups. Simple master-slave 
just doesn't cut it.

Statement-based replication is, frankly, scary.

Personally I'd only be willing to use it if the database would guarantee
to throw an exception when any statement that may produce different
results on master and slave(s) was issued, like the
limit-without-order-by case mentioned on the MySQL replication docs.

I don't know how it could guarantee that.  That's really why row-based
is better.

  Even then I don't really understand how it can produce consistent
replicas in the face of, say, two concurrent statements both pulling
values from a sequence. There would need to be some sort of side channel
to allow the master to tell the slave about how it allocated values from
the sequence.

Sequences I deal with by setting up an offset and increment for each
replica so that there are no conflicts.

You have to know the entire replication array size prior to setup.  I
usually set increment to 10 and then I can offset up to 10 replicas.

  My overall sentiment is "ick".

Re multi-master replication, out of interest: what needs does it satisfy
for you that master-slave doesn't?

- Scaling number of clients / read throughput in read-mostly workloads?

yes

  - Client-transparent fault-tolerance?

yes.

  - ... ?

What limitations of master-slave replication with read-only slaves
present roadblocks for you?

failure of single master.  

  - Client must connect to master for writes, otherwise master or slave,
  so must be more aware of connection management

- Client drivers have no way to transparently discover active master,
  must be told master hostname/ip

- ... ?

I personally find it difficult to understand how multi-master
replication can add much to throughput on write-heavy workloads. DBs are
often I/O limited after all, and if each master must write all the
others' changes you may not see much of a performance win in write heavy
environments. So: I presume multi-master replication is useful mainly in
read-mostly workloads ? Or do you expect throughput gains in write-heavy
workloads too?

If the latter, is it really multiple master replication you want rather
than a non-replica clustered database, where writes to one node don't
get replicated to the other nodes, they just get notified via some sort
of cache coherence protocol?

I guess my point is that personally I think it'd be helpful to know
_why_ you need more than what's on offer. What specific features pose
problems or would benefit you, how, and why. Etc.

      That's probably why it's not on the survey--everybody knows that's 
important and it's already being worked on actively.

    Ok, I just felt it should still be there. But, I hope development 
understands just how important good replication really is.

"development" appear to be well aware. They're also generally very
willing to accept help, testing, and users who're willing to trial early
efforts. Hint, hint. Donations of paid developer time to work on a
project you find to be commercially important probably wouldn't go
astray either.

Regards,

Gerry