Hi, Ilari Liusvaara wrote: > This helper function copies bidirectional stream of data between > stdin/stdout and specified file descriptors. >From this description, I am expecting something to the effect of sendfile(output, input, NULL, SIZE_MAX); but this is much longer than that. Why? Let's see... > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -862,3 +862,257 @@ int transport_helper_init(struct transport *transport, const char *name) > transport->smart_options = &(data->transport_options); > return 0; > } > + > + > +#define BUFFERSIZE 4096 > +#define PBUFFERSIZE 8192 Magic numbers. Where do these come from? Were they tuned or are they off the top of the head? (Either is fine; it just is nice to know.) > +/* Print bidirectional transfer loop debug message. */ > +static void transfer_debug(const char *fmt, ...) > +{ > + va_list args; > + char msgbuf[PBUFFERSIZE]; > + static int debug_enabled = -1; > + > + if (debug_enabled < 0) > + debug_enabled = getenv("GIT_TRANSLOOP_DEBUG") ? 1 : 0; > + if (!debug_enabled) > + return; > + > + sprintf(msgbuf, "Transfer loop debugging: "); > + va_start(args, fmt); > + vsprintf(msgbuf + strlen(msgbuf), fmt, args); > + va_end(args); > + fprintf(stderr, "%s\n", msgbuf); > +} Why this instead of just vfprintf? (vreportf() in usage.c does the same thing; maybe there is some portability reason?) This can overflow if the caller is not careful. Might be clearer to write va_start(args, fmt); vsnprintf(msgbuf, sizeof(msgbuf), fmt, args); va_end(args); vfprintf(stderr, "Transfer loop debugging: %s\n", msgbuf); like vreportf() does. > +/* Load the parameters into poll structure. Return number of entries loaded */ Not clear to me what this comment means. What is this function used for? > +static int load_poll_params(struct pollfd *polls, size_t inbufuse, > + size_t outbufuse, int in_hup, int out_hup, int in_closed, > + int out_closed, int socket_mode, int input_fd, int output_fd) Scary. Maybe a params struct would make callers easier to understand? What the parameters mean (I'm guessing): polls - buffer for file descriptors to poll on. There should be room for 4. inbufuse - < BUFFERSIZE if we can handle more input on stdin outbufuse - < BUFFERSIZE if we can handle more input on input_fd --- why is this called "out"? in_hup - whether the other end of stdin has already hung up out_hup - whether the other end of input_fd has already hung up in_closed - ??? out_closed - ??? socket_mode - true if input_fd == output_fd input_fd, output_fd - file descriptors [...] > + if (!in_hup && inbufuse < BUFFERSIZE) { > + stdin_index = nextindex++; > + polls[stdin_index].fd = 0; > + transfer_debug("Adding stdin to fds to wait for"); > + } > + if (!out_hup && outbufuse < BUFFERSIZE) { > + input_index = nextindex++; > + polls[input_index].fd = input_fd; > + transfer_debug("Adding remote input to fds to wait for"); > + } > + if (!out_closed && outbufuse > 0) { > + stdout_index = nextindex++; > + polls[stdout_index].fd = 1; > + transfer_debug("Adding stdout to fds to wait for"); > + } Repetitive. Maybe this could be factored out as a mini-function: consider_waiting_for(in_hup, inbufuse, &stdin_index, nextindex, polls, 0, "stdin"); Hmm, never mind. [...] > +} > + > +static int transfer_handle_events(struct pollfd* polls, char *in_buffer, > + char *out_buffer, size_t *in_buffer_use, size_t *out_buffer_use, > + int *in_hup, int *out_hup, int *in_closed, int *out_closed, > + int socket_mode, int poll_count, int input, int output) Scarier. > +{ > + int i, r; > + for(i = 0; i < poll_count; i++) { Long loop. What is it for? (A comment might help.) Maybe the body could get its own function? > + /* Handle stdin. */ > + if (polls[i].fd == 0 && polls[i].revents & (POLLIN | POLLHUP)) { Or better, the code to handle each event might get its own function. This one reads as much data as possible into in_buffer[], taking care to handle EOF appropriately. [...] > + } > + > + /* Handle remote end input. */ > + if (polls[i].fd == input && > + polls[i].revents & (POLLIN | POLLHUP)) { This one reads as much data as possible into out_buffer[], taking care to handle EOF appropriately. Presumably out_buffer[] means data scheduled for output. > + transfer_debug("remote input is readable"); > + r = read(input, out_buffer + *out_buffer_use, > + BUFFERSIZE - *out_buffer_use); > + if (r < 0 && errno != EWOULDBLOCK && errno != EAGAIN && > + errno != EINTR) { > + perror("read(connection) failed"); > + return 1; Why use perror() instead of error() which can be overridden by setting error_routine? Why return 1 instead of the usual -1 for error? > + } else if (r == 0) { > + transfer_debug("remote input EOF"); > + *out_hup = 1; > + if (!*out_buffer_use) { > + close(1); > + *out_closed = 1; > + transfer_debug("Closed stdout"); > + } else > + transfer_debug("Delaying stdout close because output buffer has data"); Why is stdout closed here? Could that be taken care of later by checking *out_hup? [...] > + r = write(1, out_buffer, *out_buffer_use); [...] > + if (*out_buffer_use > 0) > + memmove(out_buffer, out_buffer + r, > + *out_buffer_use); This only writes as much data to stdout as one write() allows, to avoid deadlock, presumably. > + if (*out_hup && !*out_buffer_use) { > + close(1); After each write() we check *out_hup, but only after a write. Would be simpler to unconditionally check *out_hup and avoid the duplicate code before; would that slow this down? > + /* Handle remote end output. */ Dual to the above. [...] > +/* Copy data from stdin to output and from input to stdout. */ Ah. I think this belongs in the commit message, too. :) Still I wonder "why". What is the motivational example? > +int bidirectional_transfer_loop(int input, int output) > +{ [...] > + while (1) { A typical poll loop. Nothing scary here. > + int r; > + poll_count = load_poll_params(polls, in_buffer_use, > + out_buffer_use, in_hup, out_hup, in_closed, out_closed, > + socket_mode, input, output); > + if (!poll_count) { > + transfer_debug("Transfer done"); > + break; return 0; would avoid the reader having to scroll down, I think. Okay, so we actually have the effect of sendfile(output_fd, 0, NULL, SIZE_MAX); sendfile(1, input_fd, NULL, SIZE_MAX); interleaved to avoid deadlock. In other words, this interchanges input for output file descriptors. The main application is remote-fd, which needs to do this to forward input to and output from a parent process. For remote-ext, for a moment one might imagine one could avoid the trouble by letting the child inherit stdin and stdout. Unfortunately, remote-ext needs to be able to prepend some data to its child's input stream. So the effect of sendfile(output_fd, 0, NULL, SIZE_MAX); is still necessary (output_fd is a pipe pointing to the child's standard input), and to time this to avoid deadlock, it still needs to be interleaved with sendfile(1, input_fd, NULL, SIZE_MAX); Hope that helps. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html