On 2012-09-20 10:51 NeilBrown <neilb@xxxxxxx> Wrote: >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote: > >> In func 'ops_run_bio' if you read the dev which the last reading >> of this dev didn't return,it will destrory the req/rreq'source of rdev. >> It may call hung-task. >> For example, for badsector or other reasons, read-operation only used >> stripe instead of chunk_aligned_read. >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three >> bios merged. >> Because media error of sector from 0 to 7, the request retried. >> At this time, raid5d readed stripe0 again.But it will set 'bio->next = >> NULL'.So the stripe 8 and 16 didn't return. >> >> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx> > >Hi, > I'm really trying, but I cannot understand what you are saying. > Sorry for my bad english. >I think the situation that you are describing involves a 24 sector request. >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16. > >So 'toread' on the first device of each stripe points to this bio, and >bi_next is NULL. > >The "req" bio for each device is filled out to read one page and these three >'req' bios are submitted. The block layer merges these into a single request. > >This request reports an error because there is a read error somewhere in the >first 8 sectors. > Yes, >So one, or maybe all, of the 'req' bios return with an error? >From my test, when req did not return and at the same time, the bio(stripe 0) send. So this operation will set bi_next is NULL. Why am i mention the badsector? Because my disks which had badsector can reproduced easily. For disks which didn't have badsector,i can't reproduced this bug. >Then RAID5 marks that device as being bad and reads from the other devices to >recover the data. > >Where exactly does something go wrong? which bio gets bi_next set to NULL >when it shouldn't? > >If you could explain with lots of detail it would help a lot. > The question is to ensure the first-read returned and the next to send on same dev of stripe. >Thanks, >NeilBrown > > >> --- >> drivers/md/raid5.c | 13 ++++++++++--- >> drivers/md/raid5.h | 1 + >> 2 files changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index adda94d..e52ba1b 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) >> rw = WRITE_FUA; >> else >> rw = WRITE; >> - } else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) >> - rw = READ; >> - else if (test_and_clear_bit(R5_WantReplace, >> + } else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) { >> + /*The last read did not return,so skip this read*/ >> + if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) { >> + clear_bit(R5_LOCKED, &sh->dev[i].flags); >> + continue; >> + } else >> + rw = READ; >> + } else if (test_and_clear_bit(R5_WantReplace, >> &sh->dev[i].flags)) { >> rw = WRITE; >> replace_only = 1; >> @@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s) >> pr_debug("skip op %ld on disc %d for sector %llu\n", >> bi->bi_rw, i, (unsigned long long)sh->sector); >> clear_bit(R5_LOCKED, &sh->dev[i].flags); >> + clear_bit(R5_Reading, &sh->dev[i].flags); >> set_bit(STRIPE_HANDLE, &sh->state); >> } >> } >> @@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error) >> } >> rdev_dec_pending(rdev, conf->mddev); >> clear_bit(R5_LOCKED, &sh->dev[i].flags); >> + clear_bit(R5_Reading, &sh->dev[i].flags); >> set_bit(STRIPE_HANDLE, &sh->state); >> release_stripe(sh); >> } >> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h >> index a9fc249..a596df8 100644 >> --- a/drivers/md/raid5.h >> +++ b/drivers/md/raid5.h >> @@ -298,6 +298,7 @@ enum r5dev_flags { >> R5_WantReplace, /* We need to update the replacement, we have read >> * data in, and now is a good time to write it out. >> */ >> + R5_Reading, /* this dev on reading on lld*/ >> }; >> >> /* > >?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f