Re: raid0 vs. mkfs

NeilBrown <neilb@xxxxxxxx> · Mon, 12 Dec 2016 14:17:58 +1100

On Fri, Dec 09 2016, Coly Li wrote:

> On 2016/12/9 上午3:19, Shaohua Li wrote:
>> On Fri, Dec 09, 2016 at 12:44:57AM +0800, Coly Li wrote:
>>> On 2016/12/8 上午12:59, Shaohua Li wrote:
>>>> On Wed, Dec 07, 2016 at 07:50:33PM +0800, Coly Li wrote:
>>> [snip]
>>>> Thanks for doing this, Coly! For raid0, this totally makes sense. The raid0
>>>> zones make things a little complicated though. I just had a brief look of your
>>>> proposed patch, which looks really complicated. I'd suggest something like
>>>> this:
>>>> 1. split the bio according to zone boundary.
>>>> 2. handle the splitted bio. since the bio is within zone range, calculating
>>>> the start and end sector for each rdev should be easy.
>>>>
>>>
>>> Hi Shaohua,
>>>
>>> Thanks for your suggestion! I try to modify the code by your suggestion,
>>> it is even more hard to make the code that way ...
>>>
>>> Because even split bios for each zone, all the corner cases still exist
>>> and should be taken care in every zoon. The code will be more complicated.
>> 
>> Not sure why it makes the code more complicated. Probably I'm wrong, but Just
>> want to make sure we are in the same page: split the bio according to zone
>> boundary, then handle the splitted bio separately. Calculating end/start point
>> of each rdev for the new bio within a zone should be simple. we then clone a
>> bio for each rdev and dispatch. So for example:
>> Disk 0: D0 D2 D4 D6 D7
>> Disk 1: D1 D3 D5
>> zone 0 is from D0 - D5, zone 1 is from D6 - D7
>> If bio is from D1 to D7, we split it to 2 bios, one is D1 - D5, the other D6 - D7.
>> For D1 - D5, we dispatch 2 bios. D1 - D5 for disk 1, D2 - D4 for disk 0
>> For D6 - D7, we just dispatch to disk 0.
>> What kind of corner case makes this more complicated?
>>  
>
> Let me explain the corner cases.
>
> When upper layer code issues a DISCARD bio, the bio->bi_iter.bi_sector
> may not be chunk size aligned, and bio->bi_iter.bi_size may not be
> (chunk_sects*nb_dev) sectors aligned. In raid0, we can't simply round
> up/down them into chunk size aligned number, otherwise data
> lost/corruption will happen.
>
> Therefore for each DISCARD bio that raid0_make_request() receive, the
> beginning and ending parts of this bio should be treat very carefully.
> All the corner cases *come from here*, they are not about number of
> zones or rdevs, it is about whether bio->bi_iter.bi_sector and
> bio->bi_iter.bi_size are chunk size aligned or not.
>
> - beginning of the bio
>   If bio->bi_iter.bi_sector is not chunk size aligned, current raid0
> code will split the beginning part into split bio which only contains
> sectors from bio->bi_iter.sector to next chunk size aligned offset, and
> issue this bio by generic_make_request(). But in
> discard_large_discard_bio() we can't issue the split bio now, we have to
> record lenth of this split bio into a per-device structure, and issue a

Why?

Why cannot you just split of the start of the bio and chain it with the
rest of the bio?

If the bio doesn't start at the beginning of a stripe, just split of the
first (partial) chunk exactly was we currently do.

If it does start at the beginning of a stripe, then split off a whole
number of stripes and allocate one bio for each device.  Chain those to
the original bio together with any remainder (which isn't a whole
stripe).

I think that if you make use of bio_split() and bio_chain() properly,
the code will be much simpler.

NeilBrown
Attachment:
signature.asc

Description: PGP signature