Re: Reliability with RAID 10 SSD and Streaming Replication

Greg Smith <greg@xxxxxxxxxxxxxxx> · Wed, 22 May 2013 15:30:30 -0400

On 5/22/13 3:06 PM, Joshua D. Drake wrote:
Greg, can you elaborate on the SSD + Xlog issue? What type of burn
through are we talking about?

You're burning through flash cells at a multiple of the total WAL write 
volume.  The system I gave iostat snapshots from upthread (with the 
Intel 710 hitting its limit) archives about 1TB of WAL each week.  The 
actual amount of WAL written in terms of erased flash blocks is even 
higher though, because sometimes the flash is hit with partial page 
writes.  The write amplification of WAL is much worse than the main 
database.

I gave a rough intro to this on the Intel drives at 
http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ and there's 
a nice "Write endurance" table at 
http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-2.html

The cheapest of the Intel SSDs I have here only guarantees 15TB of total 
write endurance.  Eliminating >1TB of writes per week by moving the WAL 
off SSD is a pretty significant change, even though the burn rate isn't 
a simple linear thing--you won't burn the flash out in only 15 weeks.

The production server is actually using the higher grade 710 drives that 
aim for 900TB instead.  But I do have standby servers using the low 
grade stuff, so anything I can do to decrease SSD burn rate without 
dropping performance is useful.  And only the top tier of transaction 
rates will outrun a RAID1 pair of 15K drives dedicated to WAL.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance