RE: [PATCH] fix: mdadm -Ss for external metadata don't stop container

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Neil Brown [mailto:neilb@xxxxxxx]
> Sent: Tuesday, December 07, 2010 11:16 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; Labun,
> Marcin; Czarnowska, Anna
> Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop
> container
> 
> On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw"
> <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx> wrote:
> 
> > Neil,
> >
> > The one below is a fix for the problem we encounter quite often when we
> try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds
> open container device and then exits. The time that system make clean up is
> quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened.
> > Second resolution is to retry ioctl in mdadm after mdmon exits, but
> closing handle is I what should be done before process exist.
> > Take a look at the patch below:
> >
> > --
> > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its
> device
> > and do not close it before exit(). The period between open and release of
> > handle is too long and md is not able stop device. Releasing handle
> before
> > exit does not block md.
> >
> > Signed-off-by: Przemyslaw Czarnowski
> <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx>
> 
> I've applied this, but I'm not 100% sure it is completely safe.
> mdmon holds the O_EXCL open to be sure that mdadm isn't creating or
> assembling another array in the container.
> mdadm will get an O_EXCL and then try sending a signal to mdmon.  If it
> succeeds, it knows mdmon is still running.  But this patch might open a
> window where mdadm can get O_EXCL, and a signal still works.
On the manual pages, behavior of O_EXCL is only defined in connection with O_CREAT flag, which is not present in open_dev_excl (of course:). I have just make test for open(name, O_RDWR | O_EXCL) few times on the same file and it does not block other processes...

> 
> However I'm not certain that window wasn't already there, and this might
> just
> make it a bit bigger.
> I've put a note in my to-do list to look into this more closely and figure
> out if there is a problem, and if so, how to fix it.
Yes, this fix do not close this issue completely. First, the window exist and mdadm still have a chance to hit it. Second - monitor should wait until manager finishes his work (what is not fulfilled right now). I have used "return -1" instead of exit(0), but manager seems to miss that ping preformed right before...

> 
> Thanks,
> NeilBrown
> 
> 
> 
> > ---
> >  monitor.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/monitor.c b/monitor.c
> > index 59b4181..f166bc8 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container,
> int nowait)
> >  				remove_pidfile(container->devname);
> >  			exit_now = 1;
> >  			signal_manager();
> > +			close(fd);
> >  			exit(0);
> >  		}
> >  	}

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux