RAID setups, usage, Q's' effect of spindle groups...etc...blah blahblah...

Linda Walsh <xfs@xxxxxxxxx> · Sun, 20 Jan 2013 19:19:20 -0800

Stan Hoeppner wrote:
On 1/19/2013 6:46 PM, Dave Chinner wrote:
On Sat, Jan 19, 2013 at 03:55:17PM -0800, Linda Walsh wrote:

	All that talk about RAIDs recently, got me depressed a bit
when I realize that while I can get fast speeds, type speeds in seeking
around are about 1/10-1/20th the speed...sigh.

	Might that indicate that I should go with smaller RAIDS with more
spindles?  I.e. instead of 3 groups of RAID5 striped as 0, go for 4-5 groups
of RAID5 striped as a 0?  Just aligning the darn things nearly takes a rocket
scientist!  But then start talking about multiple spindles and optimizing
IOP's...ARG!...;-)  (If it wasn't challenging, I'd find it boring...)...
Somebody on the list might be able to help you with this - I don't
have the time right now as I'm deep in metadata CRC changes...

I have time Dave.  Hay Linda, if you're to re-architect your storage,
the first thing I'd do is ditch that RAID50 setup.  RAID50 exists
strictly to reduce some of the penalties of RAID5.  But then you find
new downsides specific to RAID50, including the alignment issues you
mentioned.
----
	Well, like I said, I don't have a pressing need, since it works
fairly well for most of my activities.  But it's hard to characterize my
workload as I do development and experimenting.  My most recent failure (well
not entirely), that I wouldn't say is 'closed', was trying to up the BW between
my workstation and my server.  I generally run my server as a 'backend' file store
for my workstation, though I do linux devel, I work & play through a Win7
workstation.  It provides content for my living room 'TV'[sic] as well as
music.  Those are relatively low drain.  I take breaks throughout the day from
doing software work/programming to watching a video or playing the occasional game.
If I do any one thing for too long, I'm liable to worsen back and RSI problems.

	I ran a diff between my primary media disk and a primary duplicate of it that
I use as a backup.  I'd just finished synchronizing them with rsync, then decided to
use my media library (~6.5T) as a test bed for my 'dedup' program I'm working on.
(I screwed myself once before when I thought it was working but it wasn't and I
didn't catch it immediately - which is why I used it as a test bed after doing a
full sync of it, and then ran a diff -r on the two disks.  Fortunately, even though
the 'dedup'r prog found and linked about 140 files on the Media disk, they
were all correct.  The diff ran in about 5 hours, or averaged around 400+MB/s,
likely limited by the Media disk as it's a simple 5-disk/4-spindle RAID 5.

   I have 4 separate RAIDs:

1) Boot+OS: RAID5 2-data spindles; short-stroked@50% w/68gB, 15K SAS.. Hitachi 
-- just
noticed today, they aren't exactly matched.  2 are MUA3073RC, 1 is MBA3073RC.  Odd.
This array is optimized more for faster seeking (the 50% usage limited to the 
outside
tracks) than linear speed -- I may migrate those to SSD's at some point.

2) Downloaded+online media+SW. RAID5: 4-data spindles using 2tB(1.819TB) Hitachi 
Ultrastar 7.2K SATA's (note, the disks in #3 & #4 are the same type).

3) Main data+devel disk: RAID50 12-data spindles in 3 groups of 4.  NOTE: I tried
and benched RAID60 but wasn't happy with the performance, not to mention the 
diskspace
hit RAID10 would be a bit too decadent for my usage/budget.

4) Backups: RAID6: 6-data spindles.  Not the fastest config, but it is not
bad for backups.

#3 is my play/devel/experimentation RAID, it's divided with LVM.  #4 and #2 have
an LVM layer as well, but since it's currently a 1:1 mapping, it doesn't come into
play much other than eating a few MB and possibly allowing me to more easily reorg
them in the future.  On #3 currently using 12.31tB in 20 partitions (but only 3
work partitions)... the rest are snapshots (only 1 live snapshot others are copies
of diffs for those dates... ).

-------------
NOTE:
One thing that had me being less happy than usual with the speed -- the internal
battery on #3 was going through reconditioning, and that meant the internal cache
policy went to WT (write-through) instead of (WB) write-back.
I think that's something that was causing me some noticeable slowdown
-- just found out about that last night in reviewing the controller log.

Note -- I generally Like the RAID50's, they don't "REALLY" have a stripe size of
768k -- that's just optimal speed/write amount before it hits the same disk
again.  But since it is a RAID50, Any small write only needs to update 1 of the 
RAID5
groups, so 256k stripe size, which is far more reasonable/normal.

Cards, 1 internal: Dell Perc 6/i (serving #1 & #2 above -- all internal)
1 LSI MR9280DE-8e (serving #3+4)
2 Enclosures LSI-DE1600-SAS (12x3.5" ea)

Briefly describe your workload(s), the total capacity you have now, and
(truly) need now and project to need 3 years from now.
---
	3 years from now?  Ha!.  Lets just say that with the dollar dropping
as fast as disk prices over the past 4 years has flamboozled any normal
planning.

	I was mostly interested in how increasing number of spindles
in a Raid50 would help parallelism.   My thoughts on that
was that since each member of a RAID0, can be read or written independently
of any other member (as there is no parity to check), that IF I wanted to
increase parallelism (while hurting maximum throughput AND disk space), I
**could** reconfigure to .. well extreme would be 5 groups of 2-data/3disk
RAID5's.  That would, I think, theoretically  (and if the controller is up to
it, which I think it is), allow *up_to* 5 separate reads/writes to be served
in parallel, vs. now, I think it should be 3.

	A middling approach is to use an extra disk (total 16 instead of 15)
to go with 4 groups of RAID5 @ 3data disks each -- which would give the same
space, but consume my spare.  Am unclear about what it would do to maximum
throughput, but likely it would go down a bit on writes due to write-overhead
increasing from 25% to 33%.

	It was, I thought, a fairly simply question, but I have a history
of sometimes thinking things will be easier than they are proportional to
how far away (in future or someone else doing it! ;-)) something is...

 If it is needed, I'll recommend vendor specific hardware if you like
that will plug into your existing gear, or I can provide information on
new dissimilar brand storage gear.  And of course I'll provide necessary
Linux and XFS configuration information optimized to the workload and
hardware.  I'm not trying to consult here, just providing
information/recommendations.
----
	My **GENERAL** plan if prices had cooperated was to move
to 3TB SATA's and **mabye** a 3rd enclosure -- I sorta like the LSI ones..
they seem pretty solid.   Have tried a few others and generally found them
not as good, but have looked on the economical side since this is for
a home office^h^h^h^h^h^hlab^h^h^hplay setup....

In general, yes, more spindles will always be faster if utilized
properly.  But depending on your workload(s) you might be able to fix
your performance problems by simply moving your current array to non
parity RAID10, layered stripe over RAID1 pairs, concat, etc, thus
eliminating the RMW penalty entirely.
----
	Consider this -- my max read and write (both), on my
large array is 1GB/s.  There's no way I could get that with a RAID10 setup
without a much larger number of disks.  Though I admit, concurrency would
rise... but I generate most of my workload, so usually I don't have
too many things going on at the same time... a few maybe...

	When an xfs_fsr kicks in and starts swallowing disk-cache, *ahem*,
and the daily backup kicks in, AND the daily 'rsync' to create a static
snapshot... things can slow down a bit.. but rare am I up at those hours...

	The most intensive is the xfs_fsr, partly due to it swallowing
up disk cache (it runs at nice -19 ionice -c3, and I can still feel it!)...

	I might play more with putting it in it's own blkio cgroup.
and just limiting the overall disk transactions...(not to mention
fixing that disk-buffer usage issue)...

 You'll need more drives to  maintain the same usable capacity,
---

(oh, a minor detail! ;^))...

;-)

Don't spend much time on this.. (well if you read it, that might be too much
already! ;-))... As I said it's not THAT important...and was mostly about
the effect of groups in a RAID50 relating to performance tradeoffs.

Thanks for any insights...(I'm always open to learning how wrong I am! ;-))...

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs