Re: Raid 4/5 small writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday April 16, davidsen@xxxxxxx wrote:
> Neil Brown wrote:
> >
> >If you are writing exactly half the data in a stripe, I think it takes
> >the first option (Read the old unchanged data) as that is fewer reads
> >than reading the old changed data and the parity block.
> >
> >Does that make sense?
> >  
> >
> You can do some simulation of this, but consid this scenario: I have a 
> write which changes five of eight data blocks of a RAID4/5. 
> 8data+1parity. If I read the old data I read blocks from three drives 
> and write to six, generating a seek+io of every drive in the array.
> 
> if I read the old data and parity only on the drives which will be 
> rewritten, I generate two seeks and two io on five data and one parity 
> drive. However... I have not generate seeks on the other three drives, 
> and the 2nd seek on each drive will either be a no-op if the data are on 
> a single cylinder, or will be small, the seek from the end of the old 
> data back to the start.
> 
> I think the better performance would depend on the size of the io and 
> seek (small, or it would write every data drive), and the load (leaving 
> three drives to do user io could be a gain). Am I missing something? I 
> don't disagree with the way it works, I'm just not sure it's optimal if 
> the write doesn't reach every drive in the array.
> 

Optimising for a single IO request is pretty pointless.  The situation
that you want to optimise for is when there are lots of IO requests
coming and queueing up and keeping the array busy.  In this case I
suspect you would want to spread load evenly over all devices, and
keep the total number of IOs to a minimum.  That is what the current
code tries to do.

Ofcourse, this is pure theory, and so could be purely wrong.  The only
way we could tell is to do some measurements on some interesting
loads.

It would be very easy to change the code to all to read-modify-write
or always do reconstruct-write, and then test performance for some
real benchmark.  I would certainly be interested if any benchmarks
were significantly affected by the choice.

So I agree with your first comment - doing some simulations is best.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux