Re: Reliability with RAID 10 SSD and Streaming Replication

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 23 May 2013 08:47:13 -0500

On Thu, May 23, 2013 at 1:56 AM, Andrea Suisani <sickpig@xxxxxxxxxxxx> wrote:
> On 05/22/2013 03:30 PM, Merlin Moncure wrote:
>>
>> On Tue, May 21, 2013 at 7:19 PM, Greg Smith <greg@xxxxxxxxxxxxxxx> wrote:
>>>
>>> On 5/20/13 6:32 PM, Merlin Moncure wrote:
>
>
> [cut]
>
>
>>> The only really huge gain to be had using SSD is commit rate at a low
>>> client
>>> count.  There you can easily do 5,000/second instead of a spinning disk
>>> that
>>> is closer to 100, for less than what the battery-backed RAID card along
>>> costs to speed up mechanical drives.  My test server's 100GB DC S3700 was
>>> $250.  That's still not two orders of magnitude faster though.
>>
>>
>> That's most certainly *not* the only gain to be had: random read rates
>> of large databases (a very important metric for data analysis) can
>> easily hit 20k tps.  So I'll stand by the figure. Another point: that
>> 5000k commit raid is sustained, whereas a raid card will spectacularly
>> degrade until the cache overflows; it's not fair to compare burst with
>> sustained performance.  To hit 5000k sustained commit rate along with
>> good random read performance, you'd need a very expensive storage
>> system.   Right now I'm working (not by choice) with a teir-1 storage
>> system (let's just say it rhymes with 'weefax') and I would trade it
>> for direct attached SSD in a heartbeat.
>>
>> Also, note that 3rd party benchmarking is showing the 3700 completely
>> smoking the 710 in database workloads (for example, see
>> http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/6).
>
>
> [cut]
>
> Sorry for interrupting but on a related note I would like to know your
> opinions on what the anandtech review said about 3700 poor performance
> on "Oracle Swingbench", quoting the relevant part that you can find here (*)
>
> <quote>
>
> [..] There are two components to the Swingbench test we're running here:
> the database itself, and the redo log. The redo log stores all changes that
> are made to the database, which allows the database to be reconstructed in
> the event of a failure. In good DB design, these two would exist on separate
> storage systems, but in order to increase IO we combined them both for this
> test.
> Accesses to the DB end up being 8KB and random in nature, a definite strong
> suit
> of the S3700 as we've already shown. The redo log however consists of a
> bunch
> of 1KB - 1.5KB, QD1, sequential accesses. The S3700, like many of the newer
> controllers we've tested, isn't optimized for low queue depth, sub-4KB,
> sequential
> workloads like this. [..]
>
> </quote>
>
> Does this kind of scenario apply to postgresql wal files repo ?

huh -- I don't think so.  wal file segments are 8kb aligned, ditto
clog, etc.  In XLogWrite():

  /* OK to write the page(s) */
  from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
  nbytes = npages * (Size) XLOG_BLCKSZ;  <--
  errno = 0;
  if (write(openLogFile, from, nbytes) != nbytes)
  {

AFICT, that's the only way you write out xlog.  One thing I would
definitely advise though is to disable partial page writes if it's
enabled.   s3700 is algined on 8kb blocks internally -- hm.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance