Re: How to deal with XFS stripe geometry mismatch with hardware RAID5

Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> · Wed, 14 Mar 2012 03:36:07 -0500

On 3/13/2012 6:21 PM, troby wrote:

> Short of recreating the filesystem with the correct stripe width, would it
> make sense to change the mount options to define a stripe width that
> actually matches either the filesystem (11 stripe elements wide) or the
> hardware (12 stripe elements wide)? Is there a danger of filesystem
> corruption if I give fstab a mount geometry that doesn't match the values
> used at filesystem creation time?

What would make sense is for you to first show

$ cat /etc/fstab
$ xfs_info /dev/raid_device_name

before we recommend any changes.

> I'm unclear on the role of the RAID hardware cache in this. Since the writes
> are sequential, 

This seems to be an assumption at odds with other information you've
provided.

> and since the volume of data written is such that it would
> take about 3 minutes to actually fill the RAID cache, 

The PERC 700 operates in write-through cache mode if no BBU is present
or the battery is degraded or has failed.  You did not state whether
your PERC 700 has the BBU installed.  If not, you can increase write
performance and decrease latency pretty substantially by adding the BBU
which enables the write-back cache mode.

You may want to check whether MongoDB uses fsync writes by default.  If
it does, and you don't have the BBU and write-back cache, this is
affecting your write latency and throughput as well.

> I would think the data
> would be resident in the cache long enough to assemble a full-width stripe
> at the hardware level and avoid the 4 I/O RAID5 penalty. 

Again, write-back-cache is only enabled with BBU on the PERC 700.  Do
note that achieving full stripe width writes is as much a function of
your application workload and filesystem tuning as it is the RAID
firmware, especially if the cache is in write-through mode, in which
case the firmware can't do much, if anything, to maximize full width
stripes.

And keep in mind you won't hit the parity read-modify-write penalty on
new stripe writes.  This only happens when rewriting existing stripes.
Your reported 50ms of latency for 100KB write IOs seems to suggest you
don't have the BBU installed and you're actually doing RMW on existing
stripes, not strictly new stripe writes.  This is likely because...

As an XFS filesystem gets full (you're at ~87%), file blocks may begin
to be written into free space within existing partially occupied RAID
stripes.  This is where the RAID5/6 RMW penalty really kicks you in the
a$$, especially if you have misaligned the filesystem geometry to the
underlying RAID geometry.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs