Re: Slow queries / commits, mis-configuration or hardware issues?

"Tomas Vondra" <tv@xxxxxxxx> · Wed, 16 Nov 2011 19:16:43 +0100

On 16 Listopad 2011, 18:31, Cody Caughlan wrote:
>
> On Nov 16, 2011, at 8:52 AM, Tomas Vondra wrote:
>
>> On 16 Listopad 2011, 2:21, Cody Caughlan wrote:
>>> How did you build your RAID array? Maybe I have a fundamental flaw /
>>> misconfiguration. I am doing it via:
>>>
>>> $ yes | mdadm --create /dev/md0 --level=10 -c256 --raid-devices=4
>>> /dev/xvdb /dev/xvdc /dev/xvdd /dev/xvde
>>> $ pvcreate /dev/md0
>>> $ vgcreate lvm-raid10 /dev/md0
>>> $ lvcreate -l 215021 lvm-raid10 -n lvm0
>>> $ blockdev --setra 65536 /dev/lvm-raid10/lvm0
>>> $ mkfs.xfs -f /dev/lvm-raid10/lvm0
>>> $ mkdir -p /data && mount -t xfs -o noatime /dev/lvm-raid10/lvm0 /data
>>
>> I'm not using EC2 much, and those were my first attempts with ephemeral
>> storage, so this may be a stupid question, but why are you building a
>> RAID-10 array on an ephemeral storage, anyway?
>>
>> You already have a standby, so if the primary instance fails you can
>> easily failover.
>>
>
> Yes, the slave will become master if master goes down. We have no plan to
> try and resurrect the master in the case of failure, hence the choice of
> ephemeral vs EBS.
>
> We chose RAID10 over RAID0 to get the best combination of performance and
> minimizing probability of a single drive failure bringing down the house.
>
> So, yes, RAID0 would ultimately deliver the best performance, with more
> risk.
>
>> What are you going to do in case of a drive failure? With a server this
>> is
>> rather easy - just put there a new drive and you're done, but can you do
>> that on EC2? I guess you can't do that when the instance is running, so
>> you'll have to switch to the standby anyway, right? Have you ever tried
>> this (how it affects the performance etc.)?
>>
>
> As far as I know one cannot alter the ephemeral drives in a running
> instance, so yes, the whole instance would have to be written off.
>
>> So what additional protection does that give you? Wouldn't a RAID-0 be a
>> better utilization of the resources?
>>
>
> Too much risk.

Why? If I understand that correctly, the only case where a RAID-10
actually helps is when an ephemeral drive fails, but not the whole
instance. Do you have some numbers how often this happens, i.e. how often
a drive fails without the instance?

But you can't actually replace the failed drive, so the only option you
have is to failover to the standby - right? Sure - with async replication,
you could loose a the not-yet-sent transactions. I see two possible
solutions:

a) use sync rep, available in 9.1 (you already run 9.1.1)

b) place WAL on an EBS, mounted as part of the failover

The EBS are not exactly fast, but it seems (e.g.
http://www.mysqlperformanceblog.com/2009/08/06/ec2ebs-single-and-raid-volumes-io-bencmark/)
the sequential performance might be acceptable.

According to the stats you've posted, you've written about 5632 MB of WAL
data per hour. That's about 1.5 MB/s on average, and that might be handled
by the EBS. Yes, if you have a peak where you need to write much more
data, this is going to be a bottleneck.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance