On Wed, 2018-03-21 at 02:43 +0000, Chongyun Wu wrote: > On 2018/3/21 0:51, Martin Wilck wrote: > > The ppoll() calls of the uxlsnr thread are vital for proper > > functioning of > > multipathd. If the uxlsnr thread can't open the socket or fails to > > call ppoll() > > for other reasons, quit the daemon. If we don't do that, multipathd > > may > > hang in a state where it can't be terminated any more, because the > > uxlsnr > > thread is responsible for handling all signals. This happens e.g. > > if > > systemd's multipathd.socket is running in and multipathd is started > > from > > outside systemd. > > > > 24f2844 "multipathd: fix signal blocking logic" has made this > > problem more > > severe. Before that patch, the signals weren't actually blocked in > > any thread. > > That's not to say 24f2844 was wrong. I still think it's correct, we > > just > > need this one on top. > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > multipathd/uxlsnr.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c > > index cdafd82943e7..6f666663fc6f 100644 > > --- a/multipathd/uxlsnr.c > > +++ b/multipathd/uxlsnr.c > > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > > > if (ux_sock == -1) { > > condlog(1, "could not create uxsock: %d", errno); > > - return NULL; > > + exit_daemon(); > > } > > > > pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock); > > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * > > sizeof(struct pollfd)); > > if (!polls) { > > condlog(0, "uxsock: failed to allocate poll > > fds"); > > - return NULL; > > + exit_daemon(); > > } > > sigfillset(&mask); > > sigdelset(&mask, SIGINT); > > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > > > /* something went badly wrong! */ > > condlog(0, "uxsock: poll failed with %d", > > errno); > > + exit_daemon(); > > break; > > } > > Hi Martin, > > Your analysis is reasonable. It is necessary to deal with fatal > error > not only to return, if not doing this multipathd can't exit normally > and > multipathd commands can't work any more. I think your patch is OK, > but I > have some ideas inspired by your patch. > Calling exit_daemon() is to shut down the multipathd, relay on the > outside to pull multipathd again. Is there a function can be use to > deal > with fatal error? I don't think so, that's why I call the error "fatal". But feel free to come up with a more intelligent solution. In the only case with practical relevance I've seen so far (socket can't be opened because it's open in systemd), it's sufficient that multipathd is started via systemd (this should be done in production anyway), or that the admin runs "systemctl stop multipathd.socket" before starting the daemon manually. In general, I don't think it makes sense to try and be overly smart. Errors that prevent the ux socket from being opened (with the mentioned exception) are likely to be so severe that any attempts to work around them will probably fail as well. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel