[PATCH v3] redirect: protect again tgtd process hang as of cluster software hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If the child process spawned to run the redirect callback script hangs, e.g
as of load/bug in the application which this script is dealing with, tgt
can hang forever. Protect against that by selecting the fd from which tgt
is expected to read, for up to 100ms, if the timeout expires, tgt terminates
the child process and fail the initiator login attempt. While not being the
ultimate solution, of using tgt_event_add et al. and making the redirect
code fully asynchronous, this patch adds protection and makes things much
better in that respect.

Signed-off-by: Alexander Nezhinsky <alexandern@xxxxxxxxxxxx>
Signed-off-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>

-----
changes from V1:
	added a comment and few formatting changes

changes from V2:
	fixed some checkpatch warnings

 usr/tgtd.c |   29 ++++++++++++++++++++++++-----
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/usr/tgtd.c b/usr/tgtd.c
index 066f46e..bc74469 100644
--- a/usr/tgtd.c
+++ b/usr/tgtd.c
@@ -309,7 +309,24 @@ int call_program(const char *cmd, void (*callback)(void *data, int result),
 		eprintf("execv failed for: %s, %m\n", cmd);
 		exit(-1);
 	} else {
+		struct timeval tv;
+		fd_set rfds;
+		int ret_sel;
+
 		close(fds[1]);
+		/* 0.1 second is okay, as the initiator will retry anyway */
+		do {
+			FD_ZERO(&rfds);
+			FD_SET(fds[0], &rfds);
+			tv.tv_sec = 0;
+			tv.tv_usec = 100000;
+			ret_sel = select(fds[0]+1, &rfds, NULL, NULL, &tv);
+		} while (ret_sel < 0 && errno == EINTR);
+		if (ret_sel <= 0) { /* error or timeout */
+			eprintf("timeout on redirect callback, terminating "
+				"child pid %d\n", pid);
+			kill(pid, SIGTERM);
+		}
 		do {
 			ret = waitpid(pid, &i, 0);
 		} while (ret < 0 && errno == EINTR);
@@ -318,11 +335,13 @@ int call_program(const char *cmd, void (*callback)(void *data, int result),
 			close(fds[0]);
 			return ret;
 		}
-		ret = read(fds[0], output, op_len);
-		if (ret < 0) {
-			eprintf("failed to get the output from: %s\n", cmd);
-			close(fds[0]);
-			return ret;
+		if (ret_sel > 0) {
+			ret = read(fds[0], output, op_len);
+			if (ret < 0) {
+				eprintf("failed to get output from: %s\n", cmd);
+				close(fds[0]);
+				return ret;
+			}
 		}

 		if (callback)

--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Linux RAID]     [Linux Clusters]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]

  Powered by Linux