Search Postgresql Archives

Re: Postgresql replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another tidbit I'd like to add. What has helped a lot in implementing high-latency master-master replication writing our software with a business process model in mind where data is not posted directly to the final tables. Instead, users are generally allowed to enter anything -- could be incorrect, incomplete or the user does not have rights -- the data is still dumped into "pending" tables for people with rights to fix/review/approve later. Only after that process is the data posted to the final tables. (Good data entered on the first try still gets pended -- validation phase simply assumes the user who entered the data is also the one who fixed/reviewed/approved.)

In terms of replication, this model allows for users to enter data on any server. The pending records then get replicated to every server. Each specific server then looks at it's own dataset of pendings to post to final tables. Final data is then replicated back to all the participating servers.

There may be a delay for the user if he/she is working on a server that doesn't have rights to post his data. However, the pending->post model gets users used to the idea of (1) entering all data in large swoop and validating/posting it afterwards and (2) data can/will sit in pending for a period of time until it is acted upon with somebody/some server with the proper authority. Hence users aren't expecting results to pop up on the screen the moment they press the submit button.




William Yu wrote:
Yes, it requires a lot foresight to do multi-master replication -- especially across high latency connections. I do that now for 2 different projects. We have servers across the country replicating data every X minutes with custom app logic resolves conflicting data.

Allocation of unique IDs that don't collide across servers is a must. For 1 project, instead of using numeric IDs, we using CHAR and pre-append a unique server code so record #1 on server A is A0000000001 versus ?x0000000001 on other servers. For the other project, we were too far along in development to change all our numerics into chars so we wrote custom sequence logic to divide our 10billion ID space into 1-Xbillion for server 1, X-Ybillion for server 2, etc.

With this step taken, we then had to isolate (1) transactions could run on any server w/o issue (where we always take the newest record), (2) transactions required an amalgam of all actions and (3) transactions had to be limited to "home" servers. Record keeping stuff where we keep a running history of all changes fell into the first category. It would have been no different than 2 users on the same server updating the same object at different times during the day. Updating of summary data fell into category #2 and required parsing change history of individual elements. Category #3 would be financial transactions requiring strict locks were be divided up by client/user space and restricted to the user's home server. This case would not allow auto-failover. Instead, it would require some prolonged threshold of downtime for a server before full financials are allowed on backup servers.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux