On Wed, 2018-01-31 at 07:43 +0000, Chongyun Wu wrote: > Hi Martin, > > My patch have tested but your patch seems to be more gracefully. May > be > there are some small issues if I am not wrong.Thanks. > > On 2018/1/30 22:18, Martin Wilck wrote: > > multipathd shouldn't try to service any more client connections > > when it receives an exit signal. Moreover, ppoll() can return > > success even if signals occured. So check for reconfigure or > > log_reset signals after handling pending client requests. > > > > Based on an analysis by Chongyun Wu. > > > > Reported-by: Chongyun Wu <wu.chongyun@xxxxxxx> > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > multipathd/main.c | 6 ++++-- > > multipathd/main.h | 2 +- > > multipathd/uxlsnr.c | 7 +++++-- > > 3 files changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/multipathd/main.c b/multipathd/main.c > > index 27cf234623d0..8e96f5dd2d7f 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int)) > > } > > > > void > > -handle_signals(void) > > +handle_signals(bool nonfatal) > > { > > if (exit_sig) { > > condlog(2, "exit (signal)"); > > + exit_sig = 0; > > If we received a exit signal should we just escape all below > commands > process to avoid the race condition between thread cancellation and > commands processing? so maybe flag exit_sig keep to 1 here? What would that be good for? We call exit_daemon(), signalling the main process to shut down. If we don't reset exit_sig, we'll call exit_daemon() again if handle_signals is ever run again. That's not necessary and might actually confuse matters. Or am I overlooking something? > > > exit_daemon(); > > } > > + if (!nonfatal) > > + return; > > if (reconfig_sig) { > > condlog(2, "reconfigure (signal)"); > > set_config_state(DAEMON_CONFIGURE); > > @@ -2200,7 +2203,6 @@ handle_signals(void) > > log_reset("multipathd"); > > pthread_mutex_unlock(&logq_lock); > > } > > - exit_sig = 0; > > reconfig_sig = 0; > > log_reset_sig = 0; > > } > > diff --git a/multipathd/main.h b/multipathd/main.h > > index ededdaba32fe..e8140feaf291 100644 > > --- a/multipathd/main.h > > +++ b/multipathd/main.h > > @@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp); > > void * mpath_pr_event_handler_fn (void * ); > > int update_map_pr(struct multipath *mpp); > > void * mpath_pr_event_handler_fn (void * pathp ); > > -void handle_signals(void); > > +void handle_signals(bool); > > > > #endif /* MAIN_H */ > > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c > > index 98ac25a68c43..dc116cf2515b 100644 > > --- a/multipathd/uxlsnr.c > > +++ b/multipathd/uxlsnr.c > > @@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > /* most of our life is spent in this call */ > > poll_count = ppoll(polls, i, &sleep_time, &mask); > > > > + handle_signals(false); > > Is it better to add and check the return value from handle_signals()? > If > return true means this a exit signal should run continue to escape > all > below processing. > If not doing that, we may need to add > pthread_cleanup_push(cleanup_lock, > &client_lock) before pthread_mutex_lock(&client_lock) in below > command > processing to avid dead lock which happen when commands processing > code > just have the lock and cancellation works to call uxsock_cleanup hook > to > get the lock and the commands processing code have no chance to > release > the lock. AFAICS it isn't necessary, because the two code parts where the lock is taken in the uxsock_listen() don't contain cancellation points. If I'm overlooking something here and pthread_cleanup_push(&client_lock) has to be used in the listener thread, that would be an independent fix. > > @@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn > > uxsock_trigger, void * trigger_data) > > FREE(inbuf); > > } > > } > > + /* see if we got a non-fatal signal */ > > + handle_signals(true); > > above code maybe not needed. Because it's safe to make sure no > commands > be processed then to handle nonefatal signal, like code > "if(poll_count > ==0){...}" do. If here to call hanle_signals(true) again, may facing > race condition between signals processing and command processing. Do you see an actual race condition or just the potential to confuse connected clients? > That > maybe not so safe. And the handle_signals have been called above. Is > there a special reason for this which I missed? Of the handle_signals() calls above, the first one handles only fatal signals, and the others are only called if ppoll() returns -1 or 0. We have two non-fatal signals, "log_reset" and "reconfigure". "log_reset" can be safely ignored for this discussion. "reconfigure" may admittedly confuse connected clients if it's called while a session is active. The loop over the clients makes sure that commands from clients that are seen before the signal are received and processed before the handle_signals() call. The question is what to do with client sessions that remain open after the loop has finished (i.e. the current command has been handled). IMO that's not the common case, more often than not client connections just send a single command. But of course, it can't be excluded either. If we don't call handle_signals() at this point, and clients continue sending commands, ppoll() won't return -1 or 0, and the "reconfigure" signal might not be processed in a long time, which seems wrong to me. After all, signals are meant to be processed asynchronously. Note that if one of the clients sends a "reconfigure" command, it will be handled immediately, possibly confusing other clients connected at the same time. Therefore I think it's correct to handle a possibly received "reconfigure" signal at this point in the code, too. It might generallly make sense for the uxlsnr thread to pause servicing commands while the daemon is not in DAEMON_RUNNING or DAEMON_IDLE state. But that's independent of the current discussion. Summarizing, unless you or someone else points out an actual race condition (in the sense that internal multipathd data structures might be corrupted), which is introduced by this particular handle_signals() call, I'd prefer adding it. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel