On Mon, Sep 5, 2011 at 3:39 AM, Adam Kwolek <adam.kwolek@xxxxxxxxx> wrote: > Problem was found during reshaping 2 volumes /raid0 and raid5/ in container. > Sometimes mdmon throws core dump due to NULL pointer exception. > > Problem occurs in scenario: > - managemon: is about spare activation (degraded raid4 volume == raid0 under takeover) > - managemon: detect level change and signals monitor (manage_member() calls replace_array()) > - monitor: detects transition raid4/5->raid0 and sets a->container to NULL > to indicate array deactivation Maybe I have lost track of the reshape implementation but I don't see where the monitor sets ->container to NULL during a reshape? Do you mean deactivate mdmon for the array after the reshape completes? > - managemon : continues his work and tries to activate spare (a->check_degraded is set). > NULL pointer is passed to metadata handler activate_spare() > Core dump is generated. > > To resolve this situation managemon (after monitor kick) checks again > a->container pointer to learn if current array is not to be deactivated. [..] > diff --git a/managemon.c b/managemon.c > index d020f82..3540dac 100644 > --- a/managemon.c > +++ b/managemon.c > @@ -475,6 +475,12 @@ static void manage_member(struct mdstat_ent *mdstat, > } > } > > + /* we are after monitor kick, > + * so container field can be cleared - check it again > + */ > + if (a->container == NULL) > + return; > + Isn't this still racy? Because we don't wait for the monitor to run before proceeding. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html