RE: [PATCH] Fix: Sometimes mdmon throws core dump during reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: NeilBrown [mailto:neilb@xxxxxxx]
> Sent: Wednesday, September 07, 2011 6:08 AM
> To: Kwolek, Adam
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH] Fix: Sometimes mdmon throws core dump during
> reshape
> 
> On Mon, 05 Sep 2011 12:39:55 +0200 Adam Kwolek <adam.kwolek@xxxxxxxxx>
> wrote:
> 
> > Problem was found during reshaping 2 volumes /raid0 and raid5/ in
> container.
> > Sometimes mdmon throws core dump due to NULL pointer exception.
> >
> > Problem occurs in scenario:
> > - managemon: is about spare activation (degraded raid4 volume == raid0
> under takeover)
> > - managemon: detect level change and signals monitor (manage_member()
> calls replace_array())
> > - monitor: detects transition raid4/5->raid0 and sets a->container to
> NULL
> >            to indicate array deactivation
> > - managemon : continues his work and tries to activate spare (a-
> >check_degraded is set).
> >               NULL pointer is passed to metadata handler
> activate_spare()
> >               Core dump is generated.
> >
> > To resolve this situation managemon (after monitor kick) checks again
> > a->container pointer to learn if current array is not to be
> deactivated.
> 
> This looks like it might be the same bug as is fixed by
>      Lukasz Dorau <lukasz.dorau@xxxxxxxxx>
> in
>   Subject: [PATCH] FIX: Mdmon crashes after changing RAID level from 1
> to 0
> 
> Does that look likely?
> 
> Thanks,
> NeilBrown

It is very rarely problem and I had got single reproduction only with applied patch pointed by you.
To completely solve this problem using Lukasze's patch only, new array monitoring deactivation
should be extended to every case. Container field should never be used for deactivation task.

Do you prefer such approach?


BR
Adam


> 
> 
> >
> > Signed-off-by: Adam Kwolek <adam.kwolek@xxxxxxxxx>
> > ---
> >
> >  managemon.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> >
> > diff --git a/managemon.c b/managemon.c
> > index d020f82..3540dac 100644
> > --- a/managemon.c
> > +++ b/managemon.c
> > @@ -475,6 +475,12 @@ static void manage_member(struct mdstat_ent
> *mdstat,
> >  		}
> >  	}
> >
> > +	/* we are after monitor kick,
> > +	 * so container field can be cleared - check it again
> > +	 */
> > +	if (a->container == NULL)
> > +		return;
> > +
> >  	/* We don't check the array while any update is pending, as it
> >  	 * might container a change (such as a spare assignment) which
> >  	 * could affect our decisions.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux