Re: Reliability with RAID 10 SSD and Streaming Replication

Cuong Hoang <climbingrose@xxxxxxxxx> · Fri, 17 May 2013 09:52:00 +1000

Thank you for your advice guys. We'll definitely turn off init.d script for PostgreSQL on the master. The standby host will be disk-based so it will be less vulnerable to power loss.

I forgot to mention that we'll set up Wal-e to ship base backups and WALs to Amazon S3 continuous as another safety measure. Again, the lost of a few WALs would not be a big issue for us. 

Do you think that this setup will be acceptable for our purposes?

Thanks,
Cuong

On Fri, May 17, 2013 at 8:39 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

On Thu, May 16, 2013 at 11:46 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:

On Thu, May 16, 2013 at 1:34 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

> On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang <climbingrose@xxxxxxxxx> wrote:

>>

>> Hi all,

>>

>> Our application is heavy write and IO utilisation has been the problem for

>> us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for

>> the master server. I'm aware of write cache issue on SSDs in case of power

>> loss. However, our hosting provider doesn't offer any other choices of SSD

>> drives with supercapacitor. To minimise risk, we will also set up another

>> RAID 10 SAS in streaming replication mode. For our application, a few

>> seconds of data loss is acceptable.

>>

>> My question is, would corrupted data files on the primary server affect

>> the streaming standby? In other word, is this setup acceptable in terms of

>> minimising deficiency of SSDs?

>

>

>

> That seems rather scary to me for two reasons.

>

> If the data center has a sudden power failure, why would it not take out

> both machines either simultaneously or in short succession?  Can you verify

> that the hosting provider does not have them on the same UPS (or even worse,

> as two virtual machines on the same physical host)?

I took it to mean that his standby's "raid 10 SAS" meant disk drive

based standby. 

I had not considered that.   If the master can't keep up with IO using disk drives, wouldn't a replica using them probably fall infinitely far behind trying to keep up with the workload?

Maybe the best choice would just be stick with the current set-up (one server, spinning rust) and just turn off synchrounous_commit, since he is already willing to take the loss of a few seconds of transactions.  

Cheers,

Jeff