Re: Reliability with RAID 10 SSD and Streaming Replication

Greg Smith <greg@xxxxxxxxxxxxxxx> · Fri, 24 May 2013 08:11:53 -0400

On 5/22/13 2:45 PM, Shaun Thomas wrote:
That read rate and that throughput suggest 8k reads. The queue size is
270+, which is pretty high for a single device, even when it's an SSD.
Some SSDs seem to break down on queue sizes over 4, and 15 sectors
spread across a read queue of 270 is pretty hash. The drive tested here
basically fell over on servicing a huge diverse read queue, which
suggests a firmware issue.

That's basically it.  I don't know that I'd put the blame specifically 
onto a firmware issue without further evidence that's the case though. 
The last time I chased down a SSD performance issue like this it ended 
up being a Linux scheduler bug.  One thing I plan to do for future SSD 
tests is to try and replicate this issue better, starting by increasing 
the number of clients to at least 300.

Related:  if anyone read my "Seeking PostgreSQL" talk last year, some of 
my Intel 320 results there were understating the drive's worst-case 
performance due to a testing setup error.  I have a blog entry talking 
about what was wrong and how it slipped past me at 
http://highperfpostgres.com/2013/05/seeking-revisited-intel-320-series-and-ncq/

With that loose end sorted, I'll be kicking off a brand new round of SSD 
tests on a 24 core server here soon.  All those will appear on my blog. 
 The 320 drive is returning as the bang for buck champ, along with a DC 
S3700 and a Seagate 1TB Hybrid drive with NAND durable write cache.

--
Greg Smith   2ndQuadrant US    greg@xxxxxxxxxxxxxxx   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance