Re: md_raid5 using 100% CPU and hang with status resync=PENDING, if a drive is removed during initialization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Feb 2015 19:03:30 -0500 Jes Sorensen <Jes.Sorensen@xxxxxxxxxx>
wrote:

> Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> writes:
> > Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> writes:
> >> NeilBrown <neilb@xxxxxxx> writes:
> >>> On Mon, 2 Feb 2015 07:10:14 +0000 Manibalan P <pmanibalan@xxxxxxxxxxxxxx>
> >>> wrote:
> >>>
> >>>> Dear All,
> >>>> 	Any updates on this issue.
> >>>
> >>> Probably the same as:
> >>>
> >>>   http://marc.info/?l=linux-raid&m=142283560704091&w=2
> >>
> >> Hi Neil,
> >>
> >> I ran some tests on this one against the latest Linus' tree as of today
> >> (1fa185ebcbcefdc5229c783450c9f0439a69f0c1) which I believe includes all
> >> your pending 3.20 patches.
> >>
> >> I am able to reproduce Manibalan's hangs on a system with 4 SSDs if I
> >> run fio on top of a device while it is resyncing and I fail one of the
> >> devices.
> >
> > Since Manibalan mentioned this issue wasn't present in earlier kernels,
> > I started trying to track down what change caused it.
> >
> > So far I have been able to reproduce the hang as far back as 3.10.
> 
> After a lot of bisecting I finally traced the issue back to this commit:
> 
> a7854487cd7128a30a7f4f5259de9f67d5efb95f is the first bad commit
> commit a7854487cd7128a30a7f4f5259de9f67d5efb95f
> Author: Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> Date:   Thu Oct 11 13:50:12 2012 +1100
> 
>     md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
>     
>     Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>     Suggested-by: Yair Hershko <yair@xxxxxxxxxxxxxxxxx>
>     Signed-off-by: NeilBrown <neilb@xxxxxxx>
> 
> If I revert that one I cannot reproduce the hang, applying it reproduces
> the hang consistently.

Thanks for all the research!

That is consistent with what you already reported.
You noted that it doesn't affect RAID6, and RAID6 doesn't have an RMW cycle.

Also, one  of the early emails from Manibalan contained:

handling stripe 273480328, state=0x2041 cnt=1, pd_idx=5, qd_idx=-1
, check:0, reconstruct:0
check 5: state 0x10 read           (null) write           (null) written           (null)
check 4: state 0x11 read           (null) write           (null) written           (null)
check 3: state 0x0 read           (null) write           (null) written           (null)
check 2: state 0x11 read           (null) write           (null) written           (null)
check 1: state 0x11 read           (null) write           (null) written           (null)
check 0: state 0x18 read           (null) write ffff8808029b6b00 written           (null)
locked=0 uptodate=3 to_read=0 to_write=1 failed=1 failed_num=3,-1
force RCW max_degraded=1, recovery_cp=7036944 sh->sector=273480328
for sector 273480328, rmw=2 rcw=1

So it is forcing RCW, even though a single block update is usually handled
with RMW.

In this stripe, the parity disk is '5' and disk 3 has failed.
That means to perform an RCW, we need to read the parity block in order
to reconstruct the content of the failed disk.  And if we were to do that,
we may as well just do an RMW.

So I think the correct fix would be to only force RCW when the array
is not degraded.

So something like this:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index aa76865b804b..fa8f8b94bfa8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3170,7 +3170,8 @@ static void handle_stripe_dirtying(struct r5conf *conf,
 	 * generate correct data from the parity.
 	 */
 	if (conf->max_degraded == 2 ||
-	    (recovery_cp < MaxSector && sh->sector >= recovery_cp)) {
+	    (recovery_cp < MaxSector && sh->sector >= recovery_cp &&
+	     s->failed == 0)) {
 		/* Calculate the real rcw later - for now make it
 		 * look like rcw is cheaper
 		 */


I think reverting the whole patch is not necessary and discards useful
functionality while the array is not degraded.

Can you test this patch please?

Thanks!

NeilBrown

Attachment: pgpqFAvWsLKTZ.pgp
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux