On Thu, 17 Mar 2011 15:53:39 +0200 Or Gerlitz <ogerlitz@xxxxxxxxxxxx> wrote: > commit fb0bd886021239aac72bb34a497af0cca946f1e7 > Author: Or Gerlitz <ogerlitz@xxxxxxxxxxxx> > Date: Thu Mar 17 15:22:30 2011 +0200 > > redirect: protect again tgtd process hang as of cluster software hang > > If the child process spawned to run the redirect callback script hangs, e.g > as of load/bug in the application which this script is dealing with, tgt > can hang forever. Protect against that by selecting the fd from which tgt > is expected to read, for up to 100ms, if the timeout expires, tgt terminates > the child process and fail the initiator login attempt. While not being the > ultimate solution, of using tgt_event_add et al. and making the redirect > code fully asynchronous, this patch adds protection and makes things much > better in that respect. > > Signed-off-by: Alexander Nezhinsky <alexandern@xxxxxxxxxxxx> > Signed-off-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx> > ----- The temporary solution is fine by me. > added a comment and few formatting changes > > diff --git a/usr/tgtd.c b/usr/tgtd.c > index 946cd3e..4f71e1e 100644 > --- a/usr/tgtd.c > +++ b/usr/tgtd.c > @@ -319,7 +319,24 @@ int call_program(const char *cmd, void (*callback)(void *data, int result), > eprintf("execv failed for: %s, %m\n", cmd); > exit(-1); > } else { > + struct timeval tv; > + fd_set rfds; > + int ret_sel; > + > close(fds[1]); > + /* 0.1 second is okay, as the initiator will retry anyway */ > + do { > + FD_ZERO(&rfds); > + FD_SET(fds[0],&rfds); > + tv.tv_sec = 0; > + tv.tv_usec = 100000; > + ret_sel = select(fds[0]+1, &rfds, NULL, NULL, &tv); > + } while (ret_sel < 0 && errno == EINTR); > + if (ret_sel <= 0) { /* error or timeout */ > + eprintf("timeout on redirect callback, \ > + terminating child pid %d\n", pid); > + kill(pid, SIGTERM); > + } Why this is necessary before waitpid()? I thought that we need to make sure that fds[0] is readable (with a timeout) before calling read() for fds[0]. > do { > ret = waitpid(pid, &i, 0); > } while (ret < 0 && errno == EINTR); > @@ -328,11 +345,13 @@ int call_program(const char *cmd, void (*callback)(void *data, int result), > close(fds[0]); > return ret; > } > - ret = read(fds[0], output, op_len); > - if (ret < 0) { > - eprintf("failed to get the output from: %s\n", cmd); > - close(fds[0]); > - return ret; > + if (ret_sel > 0) { > + ret = read(fds[0], output, op_len); > + if (ret < 0) { > + eprintf("failed to get the output from: %s\n", cmd); > + close(fds[0]); > + return ret; > + } > } > > if (callback) > > -- > To unsubscribe from this list: send the line "unsubscribe stgt" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html