Re: write performance of HW RAID VS MD RAID

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Jun 2015 09:00:54 AM Neil Brown wrote:
> On Wed, 10 Jun 2015 15:27:07 -0700
> 
> Ming Lin <mlin@xxxxxxxxxx> wrote:
> > Hi NeilBrown,
> > 
> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
> > "simplify block layer based on immutable biovecs"
> > 
> > Here is the summary.
> > http://minggr.net/pub/20150608/fio_results/summary.log
> > 
> > MD RAID6 read performance is OK.
> > But write performance is much lower than HW RAID6.
> > 
> > Is it a known issue?
> 
> It is not unexpected.
> There are two likely reasons.
> One is that HW RAID cards often have on-board NVRAM which is used as a
> write-behind cache.  This allows better throughput by hiding latency and
> more often gathering full-stripe writes.  HW RAID cards may also have
> accelerators for the parity calculations, but that is not likely to make a
> big difference. What sort of RAID6 controller do you have?
> 
> The other is that it is not easy for MD/RAID6 to schedule writes stripes
> optimally.  It doesn't really know if more writes are coming, so it should
> wait, or if it already has everything - so it should get to work straight
> away. It is possible that it could reply to writes as soon as they are in
> the (volatile) cache and only force things to storage when a REQ_FUA or
> REQ_FLUSH arrives.  That might help ... or it might corrupt filesystems :-(

And this here is the problem. Any conceptual changes that risk filesystem and 
therefore data integrity are bad. For something as simple as benchmarks it 
isn't really worth the risk of losing data integrity.

In a hardware card setup, one would hope that the write cache is battery 
backed - or flash - or something that won't lose data if the power goes out. 
When you're running this in software, you can't magically keep data if you 
lose power - so the longer something is not flushed to disk, the longer the 
risk period for a write.

If you want to extend this concept - then you're not safe from writes between 
the write buffer in the kernel and the (hopefully) battery backed RAM on the 
hardware card if power is lost. You're also not safe when the card is writing 
to the physical disk - modern hard drives have massive caches! If the drive 
has the write in its cache and loses power, is the data gone?

Guaranteed data integrity these days is a difficult subject. The kernel may say 
the data is written properly - but is it? The HW RAID card may say the data is 
written properly - but is it? Or is it still in cache? Or has it just hit the 
HDD cache?

What we currently have is a slight tradeoff in performance for a minimalisation 
of risk (as far as practical anyway) - and I'm ok with this.

-- 
Steven Haigh

Email: netwiz@xxxxxxxxx
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux