On Thu, 11 Jun 2015 09:00:54 AM Neil Brown wrote: > On Wed, 10 Jun 2015 15:27:07 -0700 > > Ming Lin <mlin@xxxxxxxxxx> wrote: > > Hi NeilBrown, > > > > As you may already see, I run a lot of tests with 10 HDDs for the patchset > > "simplify block layer based on immutable biovecs" > > > > Here is the summary. > > http://minggr.net/pub/20150608/fio_results/summary.log > > > > MD RAID6 read performance is OK. > > But write performance is much lower than HW RAID6. > > > > Is it a known issue? > > It is not unexpected. > There are two likely reasons. > One is that HW RAID cards often have on-board NVRAM which is used as a > write-behind cache. This allows better throughput by hiding latency and > more often gathering full-stripe writes. HW RAID cards may also have > accelerators for the parity calculations, but that is not likely to make a > big difference. What sort of RAID6 controller do you have? > > The other is that it is not easy for MD/RAID6 to schedule writes stripes > optimally. It doesn't really know if more writes are coming, so it should > wait, or if it already has everything - so it should get to work straight > away. It is possible that it could reply to writes as soon as they are in > the (volatile) cache and only force things to storage when a REQ_FUA or > REQ_FLUSH arrives. That might help ... or it might corrupt filesystems :-( And this here is the problem. Any conceptual changes that risk filesystem and therefore data integrity are bad. For something as simple as benchmarks it isn't really worth the risk of losing data integrity. In a hardware card setup, one would hope that the write cache is battery backed - or flash - or something that won't lose data if the power goes out. When you're running this in software, you can't magically keep data if you lose power - so the longer something is not flushed to disk, the longer the risk period for a write. If you want to extend this concept - then you're not safe from writes between the write buffer in the kernel and the (hopefully) battery backed RAM on the hardware card if power is lost. You're also not safe when the card is writing to the physical disk - modern hard drives have massive caches! If the drive has the write in its cache and loses power, is the data gone? Guaranteed data integrity these days is a difficult subject. The kernel may say the data is written properly - but is it? The HW RAID card may say the data is written properly - but is it? Or is it still in cache? Or has it just hit the HDD cache? What we currently have is a slight tradeoff in performance for a minimalisation of risk (as far as practical anyway) - and I'm ok with this. -- Steven Haigh Email: netwiz@xxxxxxxxx Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897
Attachment:
signature.asc
Description: This is a digitally signed message part.