On Fri, Dec 09 2016, Coly Li wrote: > On 2016/12/9 上午3:19, Shaohua Li wrote: >> On Fri, Dec 09, 2016 at 12:44:57AM +0800, Coly Li wrote: >>> On 2016/12/8 上午12:59, Shaohua Li wrote: >>>> On Wed, Dec 07, 2016 at 07:50:33PM +0800, Coly Li wrote: >>> [snip] >>>> Thanks for doing this, Coly! For raid0, this totally makes sense. The raid0 >>>> zones make things a little complicated though. I just had a brief look of your >>>> proposed patch, which looks really complicated. I'd suggest something like >>>> this: >>>> 1. split the bio according to zone boundary. >>>> 2. handle the splitted bio. since the bio is within zone range, calculating >>>> the start and end sector for each rdev should be easy. >>>> >>> >>> Hi Shaohua, >>> >>> Thanks for your suggestion! I try to modify the code by your suggestion, >>> it is even more hard to make the code that way ... >>> >>> Because even split bios for each zone, all the corner cases still exist >>> and should be taken care in every zoon. The code will be more complicated. >> >> Not sure why it makes the code more complicated. Probably I'm wrong, but Just >> want to make sure we are in the same page: split the bio according to zone >> boundary, then handle the splitted bio separately. Calculating end/start point >> of each rdev for the new bio within a zone should be simple. we then clone a >> bio for each rdev and dispatch. So for example: >> Disk 0: D0 D2 D4 D6 D7 >> Disk 1: D1 D3 D5 >> zone 0 is from D0 - D5, zone 1 is from D6 - D7 >> If bio is from D1 to D7, we split it to 2 bios, one is D1 - D5, the other D6 - D7. >> For D1 - D5, we dispatch 2 bios. D1 - D5 for disk 1, D2 - D4 for disk 0 >> For D6 - D7, we just dispatch to disk 0. >> What kind of corner case makes this more complicated? >> > > Let me explain the corner cases. > > When upper layer code issues a DISCARD bio, the bio->bi_iter.bi_sector > may not be chunk size aligned, and bio->bi_iter.bi_size may not be > (chunk_sects*nb_dev) sectors aligned. In raid0, we can't simply round > up/down them into chunk size aligned number, otherwise data > lost/corruption will happen. > > Therefore for each DISCARD bio that raid0_make_request() receive, the > beginning and ending parts of this bio should be treat very carefully. > All the corner cases *come from here*, they are not about number of > zones or rdevs, it is about whether bio->bi_iter.bi_sector and > bio->bi_iter.bi_size are chunk size aligned or not. > > - beginning of the bio > If bio->bi_iter.bi_sector is not chunk size aligned, current raid0 > code will split the beginning part into split bio which only contains > sectors from bio->bi_iter.sector to next chunk size aligned offset, and > issue this bio by generic_make_request(). But in > discard_large_discard_bio() we can't issue the split bio now, we have to > record lenth of this split bio into a per-device structure, and issue a Why? Why cannot you just split of the start of the bio and chain it with the rest of the bio? If the bio doesn't start at the beginning of a stripe, just split of the first (partial) chunk exactly was we currently do. If it does start at the beginning of a stripe, then split off a whole number of stripes and allocate one bio for each device. Chain those to the original bio together with any remainder (which isn't a whole stripe). I think that if you make use of bio_split() and bio_chain() properly, the code will be much simpler. NeilBrown
Attachment:
signature.asc
Description: PGP signature