On 2/13/19 7:28 AM, Pawel Baldysiak wrote: > Since the patch > c76242c5("mdmon: get safe mode delay file descriptor early"), > safe_mode_dalay is set properly by initrd mdmon. > But in some cases with filesystem traffic since the very > start of the system, it might take a while to transit to clean state. > Due to fact that new mdmon does not wait for the old one to exit - > it might happen that the new one switches safe_mode_delay back to > seconds, before old one exits. As the result two mdmons are running > concurrently on same array. > > Wait for the old mdmon to exit by pinging it with SIGUSR1 signal, > just in case it is sleeping. > > Signed-off-by: Pawel Baldysiak <pawel.baldysiak@xxxxxxxxx> > --- > mdmon.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/mdmon.c b/mdmon.c > index 0955fcc5..03e6e427 100644 > --- a/mdmon.c > +++ b/mdmon.c > @@ -171,6 +171,7 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock) > int fd; > int n; > long fl; > + int rv; > > /* first rule of survival... don't off yourself */ > if (pid == getpid()) > @@ -201,9 +202,16 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock) > fl &= ~O_NONBLOCK; > fcntl(sock, F_SETFL, fl); > n = read(sock, buf, 100); > - /* Ignore result, it is just the wait that > - * matters > - */ > + > + /* If there is I/O going on it might took some time to get to > + * clean state. Wait for monitor to exit fully to avoid races. > + * Ping it with SIGUSR1 in case that it is sleeping */ > + do { > + rv = kill(pid, SIGUSR1); > + if (rv < 0) > + break; > + usleep(200000); > + } while (1); I think the principle of this is fine, but doing an indefinite while(1) raises the little hairs on the back of my neck. Could you limit it to say 5 or 10 runs? Thanks, Jes