Re: Software RAID5 write issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2009-06-11 at 10:05 +1000, Steven Haigh wrote:
> Isn't the PCI Bus limited to around 133MB/sec? If so, even with 3  
> drives on the same controller, you would expect divided equally that  
> each drive would get ~44MB/sec before overheads - not around 7MB/sec  
> per drive. I know I'm not going to get phenomenal performance with my  
> setup, but as most the data is archiving (and then copied to tape), I  
> would like to get things at least up to a reasonable level instead of  
> having a write speed of ~12% of the read speed.

It is, but it's shared amongst all the cards on that particular bus, and
in particular older motherboards would daisy chain busses such that a
later bus only gets part of that bandwidth because earlier busses are
using it too.  Plus the PCI bus limitation is 133MB/s theoretical
maximum, in practice you get less due to things like arbitration and
such.

> Hmmm - a very interesting read - but I am a little confused when it  
> comes to PCI bandwidth. I would assume (maybe wrongly) that if I can  
> READ from the array at 95MB/sec (as measured by bonnie++), then I  
> should be able to write to the same array at a little faster than 11MB/ 
> sec - as a read would usually read from 4 of 5 drives, however a write  
> would go to all drives. This being said, I wouldn't expect one extra  
> write to equal 12% of a read speed!

There are two factors involved in this (I'm speculating of course, but
here goes).

One, a read doesn't involve every drive in the array.  For any given
stripe, you will actually only read from 4 of the 5 drives.  Since 3 of
the drives are on the card, that means that 3 out of 5 stripes, one of
those drives will be the parity drive and therefore not used in the read
process.  So, for 3 out of 5 stripes, you actually read from two of the
drives behind the card and two on the motherboard.  The other two you
read from three of the drives behind the card and one on the
motherboard.  That accounts for a reasonable amount of difference all by
itself.  As an example, I have an external SATA drive case here that
holds 4 drives on a repeater and uses a single eSATA cable to run the 4
drives.  When accessing a single drive, I get 132MB/s throughput.  When
I access two drives, it drops to 60MB/s throughput per drive.  When
accessing three drives, it drops to 39MB/s throughput per drive.  So,
you can see where, on read, the lack of need to access all three drives
can really help on specific stripes.  In other words, reading from only
4 drives at a time *helps* your performance because whatever two drives
are in use behind the PCI card run faster and can keep up better with
the two drives on the motherboard.  Since writes always go to all 5
drives, you always get the slower speed (and you are writing 25% more
data to disk relative to the amount of actual data transfered than when
you are reading to boot).

Two, you use a 1MB chunk size.  Given a 5 drive raid5, that gives a 4MB
stripe width.  My guess is that your stripe size is large enough,
relative to your average write size, that your array is more often than
not performing a read/modify/write cycle for writing instead of a full
stripe write.  In a full stripe write, the md stack will write out all
four data chunks and a calculated parity chunk without regard to what
parity was before and what data was there before.  If, on the other
hand, it doesn't have enough data to write an entire stripe by the time
it is flushing things out, then it has to do a read/modify/write cycle.
The particulars of what's most efficient in this case depend on how many
chunks are being overwritten in the stripe, but regardless it means
reading in parts of the stripe and parity first, then doing xor
operations, then writing new data and new parity back out.  This will
mean that at least some of the 5 drives are doing both reads and writes
in a single stripe operation.  So, I think the combination of
read/modify/write cycles combined with the poor performance of drives
behind the PCI card actually can result in the drastic difference you
are seeing between read and write speeds.

> The other thing I wonder is if it has something to do with the  
> sil_sata driver - as ALL the drives in the RAID5 are handled by that  
> kernel module. The boot RAID1 is on the ICH5 SATA controller - and  
> suffers no performance issues at all. It shows a good 40MB/sec+ read  
> AND write speeds per drive.

It's entirely possible that the driver plays a role in this, yes.  I
don't have any hardware that uses that driver so I couldn't say.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux