Re: Intel 710 pgbench write latencies

Yeb Havinga <yebhavinga@xxxxxxxxx> · Wed, 02 Nov 2011 16:16:07 +0100

On 2011-11-02 15:26, Merlin Moncure wrote:
On Wed, Nov 2, 2011 at 8:05 AM, Yeb Havinga<yebhavinga@xxxxxxxxx>  wrote:
Hello list,

A OCZ Vertex 2 PRO and Intel 710 SSD, both 100GB, in a software raid 1
setup. I was pretty convinced this was the perfect solution to run
PostgreSQL on SSDs without a IO controller with BBU. No worries for strange
firmware bugs because of two different drives, good write endurance of the
710. Access to the smart attributes. Complete control over the disks:
nothing hidden by a hardware raid IO layer.

Then I did a pgbench test:
- bigger than RAM test (~30GB database with 24GB ram)
- and during that test I removed the Intel 710.
- during the test I removed the 710 and 10 minutes later inserted it again
and added it to the array.

The pgbench transaction latency graph is here: http://imgur.com/JSdQd

With only the OCZ, latencies are acceptable but with two drives, there are
latencies up to 3 seconds! (and 11 seconds at disk remove time) Is this due
to software raid, or is it the Intel 710? To figure that out I repeated the
test, but now removing the OCZ, latency graph at: http://imgur.com/DQa59
(The 12 seconds maximum was at disk remove time.)

So the Intel 710 kind of sucks latency wise. Is it because it is also
heavily reading, and maybe WAL should not be put on it?

I did another test, same as before but
* with 5GB database completely fitting in RAM (24GB)
* put WAL on a ramdisk
* started on the mirror
* during the test mdadm --fail on the Intel SSD

Latency graph is at: http://imgur.com/dY0Rk

So still: with Intel 710 participating in writes (beginning of graph), some
latencies are over 2 seconds, with only the OCZ, max write latencies are
near 300ms.

I'm now contemplating not using the 710 at all. Why should I not buy two
6Gbps SSDs without supercap (e.g. Intel 510 and OCZ Vertex 3 Max IOPS) with
a IO controller+BBU?

Benefits: should be faster for all kinds of reads and writes.
Concerns: TRIM becomes impossible (which was already impossible with md
raid1, lvm / dm based mirroring could work) but is TRIM important for a
PostgreSQL io load, without e.g. routine TRUNCATES? Also the write endurance
of these drives is probably a lot less than previous setup.
software RAID (mdadm) is currently blocking TRIM.  the only way to to
get TRIM in a raid-ish environment is through LVM mirroring/striping
or w/brtfs raid (which is not production ready afaik).

Given that, if you do use software raid, it's not a good idea to
partition the entire drive because the very first thing the raid
driver does is write to the entire device.

If that is bad because of a decreased lifetime, I don't think these 
number of writes are significant - in a few hours of pgbenching I the 
GBs written are more than 10 times the GB sizes of the drives. Or do you 
suggest this because then the disk firmware can operate assuming a 
smaller idema capacity, thereby proloning the drive life? (i.e. the 
Intel 710 200GB has 200GB idema capacity but 320GB raw flash).

I would keep at least 20-30% of both drives unpartitioned to leave the
controller room to wear level and as well as other stuff.  I'd try
wiping the drives, reparititoing, and repeating your test.  I would
also compare times through mdadm and directly to the device.

Good idea.

-- Yeb

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance