> -----Original Message----- > From: Neil Brown [mailto:neilb@xxxxxxx] > Sent: Tuesday, December 07, 2010 11:16 AM > To: Hawrylewicz Czarnowski, Przemyslaw > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed; Labun, > Marcin; Czarnowska, Anna > Subject: Re: [PATCH] fix: mdadm -Ss for external metadata don't stop > container > > On Tue, 7 Dec 2010 06:44:21 +0000 "Hawrylewicz Czarnowski, Przemyslaw" > <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx> wrote: > > > Neil, > > > > The one below is a fix for the problem we encounter quite often when we > try to stop all arrays with mdadm -Ss. The main problem is that mdmon holds > open container device and then exits. The time that system make clean up is > quite long and mdadm invokes ARRAY_STOP ioctl when device is still opened. > > Second resolution is to retry ioctl in mdadm after mdmon exits, but > closing handle is I what should be done before process exist. > > Take a look at the patch below: > > > > -- > > Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its > device > > and do not close it before exit(). The period between open and release of > > handle is too long and md is not able stop device. Releasing handle > before > > exit does not block md. > > > > Signed-off-by: Przemyslaw Czarnowski > <przemyslaw.hawrylewicz.czarnowski@xxxxxxxxx> > > I've applied this, but I'm not 100% sure it is completely safe. > mdmon holds the O_EXCL open to be sure that mdadm isn't creating or > assembling another array in the container. > mdadm will get an O_EXCL and then try sending a signal to mdmon. If it > succeeds, it knows mdmon is still running. But this patch might open a > window where mdadm can get O_EXCL, and a signal still works. On the manual pages, behavior of O_EXCL is only defined in connection with O_CREAT flag, which is not present in open_dev_excl (of course:). I have just make test for open(name, O_RDWR | O_EXCL) few times on the same file and it does not block other processes... > > However I'm not certain that window wasn't already there, and this might > just > make it a bit bigger. > I've put a note in my to-do list to look into this more closely and figure > out if there is a problem, and if so, how to fix it. Yes, this fix do not close this issue completely. First, the window exist and mdadm still have a chance to hit it. Second - monitor should wait until manager finishes his work (what is not fulfilled right now). I have used "return -1" instead of exit(0), but manager seems to miss that ping preformed right before... > > Thanks, > NeilBrown > > > > > --- > > monitor.c | 1 + > > 1 files changed, 1 insertions(+), 0 deletions(-) > > > > diff --git a/monitor.c b/monitor.c > > index 59b4181..f166bc8 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -525,6 +525,7 @@ static int wait_and_act(struct supertype *container, > int nowait) > > remove_pidfile(container->devname); > > exit_now = 1; > > signal_manager(); > > + close(fd); > > exit(0); > > } > > } -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html