Re: RAID-10 explicitly defined drive pairs?

NeilBrown <neilb@xxxxxxx> · Tue, 10 Jan 2012 15:13:36 +1100

On Mon, 09 Jan 2012 21:54:56 -0600 Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
wrote:

> On 1/9/2012 7:46 AM, Peter Grandi wrote:
> 
> > Those able to do a web search with the relevant keywords and
> > read documentation can find some mentions of single SSD RMW and
> > address/length alignment, for example here:
> > 
> >   http://research.cs.wisc.edu/adsl/Publications/ssd-usenix08.pdf
> >   http://research.microsoft.com/en-us/projects/flashlight/winhec08-ssd.pptx
> >   http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-2.pdf
> > 
> > Mentioned in passing as something pretty obvious, and there are
> > other similar mentions that come up in web searches because it
> > is a pretty natural application of thinking about RMW issues.
> 
> Yes, I've read such things.  I was eluding to the fact that there are at
> least a half dozen different erase block sizes and algorithms in use by
> different SSD manufacturers.  There is no standard.  And not all of them
> are published.  There is no reliable way to do such optimization
> generically.
> 
> > Now I eagerly await your explanation of the amazing "Hoeppner
> > effect" by which address/length aligned writes on RAID0/1/10
> > have significant benefits and of the audacious "Hoeppner
> > principle" by which 'concat' is as good as RAID0 over the same
> > disks.
> 
> IIRC from a previous discussion I had with Neil Brown on this list,
> mdraid0, as with all the striped array code, runs as a single kernel
> thread, limiting its performance to that of a single CPU.  A linear
> concatenation does not run as a single kernel thread, but is simply an
> offset calculation routine that, IIRC, executes on the same CPU as the
> caller.  Thus one can theoretically achieve near 100% CPU scalability
> when using concat instead of mdraid0.  So the issue isn't partial stripe
> writes at the media level, but the CPU overhead caused by millions of
> the little bastards with heavy random IOPS workloads, along with
> increased numbers of smaller IOs through the SCSI/SATA interface,
> causing more interrupts thus more CPU time, etc.
> 
> I've not run into this single stripe thread limitation myself, but have
> read multiple cases where OPs can't get maximum performance from their
> storage hardware because their top level mdraid stripe thread is peaking
> a single CPU in their X-way system.  Moving from RAID10 to a linear
> concat gets around this limitation for small file random IOPS workloads.
>  Only when using XFS and a proper AG configuration, obviously.  This is
> my recollection of Neil's description of the code behavior.  I could
> very well have misunderstood, and I'm sure he'll correct me if that's
> the case, or you, or both. ;)

(oh dear, someone is Wrong on the Internet! Quick, duck into the telephone
booth and pop out as ....)

Hi Stan,
 I think you must be misremembering.
Neither RAID0 or Linear have any threads involved.  They just redirect the
request to the appropriate devices.  Multiple threads can submit multiple
requests down through RAID0 and Linear concurrently.

RAID1, RAID10, and RAID5/6 are different.  For reads they normally are have
no contention with other requests, but for writes things to get
single-threaded at some point.

Hm... you text above sometime talks about RAID0 vs Linear, and sometimes
about RAID10 vs Linear.  So maybe you are remembering correctly, but
presenting incorrectly in part ....

NeilBrown

> 
> Dave Chinner had some input WRT XFS on concat for this type of workload,
> stating it's a little better than RAID10 (ambiguous as to hard/soft).
> Did you read that thread Peter?  I know you're on the XFS list as well.
>  I can't exactly recall at this time Dave's specific reasoning, I'll try
> to dig it up.  I'm thinking it had to do with the different distribution
> of metadata IOs between the two AG layouts, and the amount of total head
> seeking required for the workload being somewhat higher for RAID10 than
> for the concat of RAID1 pairs.  Again, I could be wrong on that, but it
> seems familiar.  That discussion was many months ago.
> 

Attachment:
signature.asc

Description: PGP signature