On 2012-09-20 11:24 NeilBrown <neilb@xxxxxxx> Wrote: >On Thu, 20 Sep 2012 11:04:46 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote: > >> On 2012-09-20 10:51 NeilBrown <neilb@xxxxxxx> Wrote: >> >On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@xxxxxxxxx> wrote: >> > >> >> In func 'ops_run_bio' if you read the dev which the last reading >> >> of this dev didn't return,it will destrory the req/rreq'source of rdev. >> >> It may call hung-task. >> >> For example, for badsector or other reasons, read-operation only used >> >> stripe instead of chunk_aligned_read. >> >> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three >> >> bios merged. >> >> Because media error of sector from 0 to 7, the request retried. >> >> At this time, raid5d readed stripe0 again.But it will set 'bio->next = >> >> NULL'.So the stripe 8 and 16 didn't return. >> >> >> >> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx> >> > >> >Hi, >> > I'm really trying, but I cannot understand what you are saying. >> > >> Sorry for my bad english. >> >I think the situation that you are describing involves a 24 sector request. >> >This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16. >> > >> >So 'toread' on the first device of each stripe points to this bio, and >> >bi_next is NULL. >> > >> >The "req" bio for each device is filled out to read one page and these three >> >'req' bios are submitted. The block layer merges these into a single request. >> > >> >This request reports an error because there is a read error somewhere in the >> >first 8 sectors. >> > >> Yes, >> >So one, or maybe all, of the 'req' bios return with an error? >> From my test, when req did not return and at the same time, the bio(stripe 0) send. >> So this operation will set bi_next is NULL. > >Are you saying that we send another bio before the first one has returned? >That shouldn't be possible as sh->count will prevent it from happening. >While there is an outstanding request, sh->count will be >0, and until >sh->count is 0, we won't try to send any more requests. > >So I still don't understand. Please try to provide as much detail as >possible. If it is easier, write in your own language and use >translate.google.com to convert to english. ?? > >Thanks, >NeilBrown Hi, i wrote a shell-script can reproduct this bug. Note: mdadm -V mdadm - v3.3-pre - Unreleased #!/bin/bash declare -i count declare -i sector count=0 sector=2048 while true do hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdc > /dev/null hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sdd > /dev/null hdparm --make-bad-sector $sector --yes-i-know-what-i-am-doing /dev/sde > /dev/null let count++ let sector+=$count*8 if (($count == 40));then break fi done while true do mdadm -S /dev/md0 mdadm -CR /dev/md0 -l5 -c4 -n4 missing /dev/sd[cde] dd if=/dev/md0 of=/dev/null bs=10M count=1 iflag=direct sleep 1 done Thanks?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f