Re: [PATCH v3 1/3] Add bidirectional_transfer_loop()

Jonathan Nieder <jrnieder@xxxxxxxxx> · Fri, 1 Oct 2010 00:11:25 -0500

Hi,

Ilari Liusvaara wrote:

> This helper function copies bidirectional stream of data between
> stdin/stdout and specified file descriptors.

>From this description, I am expecting something to the effect of

 sendfile(output, input, NULL, SIZE_MAX);

but this is much longer than that.  Why?  Let's see...

> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -862,3 +862,257 @@ int transport_helper_init(struct transport *transport, const char *name)
>  	transport->smart_options = &(data->transport_options);
>  	return 0;
>  }
> +
> +
> +#define BUFFERSIZE 4096
> +#define PBUFFERSIZE 8192

Magic numbers.  Where do these come from?  Were they tuned or are they
off the top of the head?  (Either is fine; it just is nice to know.)

> +/* Print bidirectional transfer loop debug message. */
> +static void transfer_debug(const char *fmt, ...)
> +{
> +	va_list args;
> +	char msgbuf[PBUFFERSIZE];
> +	static int debug_enabled = -1;
> +
> +	if (debug_enabled < 0)
> +		debug_enabled = getenv("GIT_TRANSLOOP_DEBUG") ? 1 : 0;
> +	if (!debug_enabled)
> +		return;
> +
> +	sprintf(msgbuf, "Transfer loop debugging: ");
> +	va_start(args, fmt);
> +	vsprintf(msgbuf + strlen(msgbuf), fmt, args);
> +	va_end(args);
> +	fprintf(stderr, "%s\n", msgbuf);
> +}

Why this instead of just vfprintf?  (vreportf() in usage.c
does the same thing; maybe there is some portability reason?)

This can overflow if the caller is not careful.

Might be clearer to write

	va_start(args, fmt);
	vsnprintf(msgbuf, sizeof(msgbuf), fmt, args);
	va_end(args);
	vfprintf(stderr, "Transfer loop debugging: %s\n", msgbuf);

like vreportf() does.

> +/* Load the parameters into poll structure. Return number of entries loaded */

Not clear to me what this comment means.  What is this function used for?

> +static int load_poll_params(struct pollfd *polls, size_t inbufuse,
> +	size_t outbufuse, int in_hup, int out_hup, int in_closed,
> +	int out_closed, int socket_mode, int input_fd, int output_fd)

Scary.  Maybe a params struct would make callers easier to understand?

What the parameters mean (I'm guessing):

	polls - buffer for file descriptors to poll on.  There should
	        be room for 4.
	inbufuse - < BUFFERSIZE if we can handle more input on stdin
	outbufuse - < BUFFERSIZE if we can handle more input on input_fd
		--- why is this called "out"?
	in_hup - whether the other end of stdin has already hung up
	out_hup - whether the other end of input_fd has already hung up
	in_closed - ???
	out_closed - ???
	socket_mode - true if input_fd == output_fd
	input_fd, output_fd - file descriptors

[...]
> +	if (!in_hup && inbufuse < BUFFERSIZE) {
> +		stdin_index = nextindex++;
> +		polls[stdin_index].fd = 0;
> +		transfer_debug("Adding stdin to fds to wait for");
> +	}
> +	if (!out_hup && outbufuse < BUFFERSIZE) {
> +		input_index = nextindex++;
> +		polls[input_index].fd = input_fd;
> +		transfer_debug("Adding remote input to fds to wait for");
> +	}
> +	if (!out_closed && outbufuse > 0) {
> +		stdout_index = nextindex++;
> +		polls[stdout_index].fd = 1;
> +		transfer_debug("Adding stdout to fds to wait for");
> +	}

Repetitive.  Maybe this could be factored out as a mini-function:

	consider_waiting_for(in_hup, inbufuse, &stdin_index,
	                     nextindex, polls, 0, "stdin");

Hmm, never mind.

[...]
> +}
> +
> +static int transfer_handle_events(struct pollfd* polls, char *in_buffer,
> +	char *out_buffer, size_t *in_buffer_use, size_t *out_buffer_use,
> +	int *in_hup, int *out_hup, int *in_closed, int *out_closed,
> +	int socket_mode, int poll_count, int input, int output)

Scarier.

> +{
> +	int i, r;
> +	for(i = 0; i < poll_count; i++) {

Long loop.  What is it for?  (A comment might help.)  Maybe the body
could get its own function?

> +		/* Handle stdin. */
> +		if (polls[i].fd == 0 && polls[i].revents & (POLLIN | POLLHUP)) {

Or better, the code to handle each event might get its own function.

This one reads as much data as possible into in_buffer[], taking
care to handle EOF appropriately.

[...]
> +		}
> +
> +		/* Handle remote end input. */
> +		if (polls[i].fd == input &&
> +			polls[i].revents & (POLLIN | POLLHUP)) {

This one reads as much data as possible into out_buffer[], taking
care to handle EOF appropriately.  Presumably out_buffer[] means
data scheduled for output.

> +			transfer_debug("remote input is readable");
> +			r = read(input, out_buffer + *out_buffer_use,
> +				BUFFERSIZE - *out_buffer_use);
> +			if (r < 0 && errno != EWOULDBLOCK && errno != EAGAIN &&
> +				errno != EINTR) {
> +				perror("read(connection) failed");
> +				return 1;

Why use perror() instead of error() which can be overridden by
setting error_routine?  Why return 1 instead of the usual -1
for error?

> +			} else if (r == 0) {
> +				transfer_debug("remote input EOF");
> +				*out_hup = 1;
> +				if (!*out_buffer_use) {
> +					close(1);
> +					*out_closed = 1;
> +					transfer_debug("Closed stdout");
> +				} else
> +					transfer_debug("Delaying stdout close because output buffer has data");

Why is stdout closed here?  Could that be taken care of later
by checking *out_hup?

[...]
> +			r = write(1, out_buffer, *out_buffer_use);
[...]
> +				if (*out_buffer_use > 0)
> +					memmove(out_buffer, out_buffer + r,
> +						*out_buffer_use);

This only writes as much data to stdout as one write() allows,
to avoid deadlock, presumably.

> +				if (*out_hup && !*out_buffer_use) {
> +					close(1);

After each write() we check *out_hup, but only after a write.  Would
be simpler to unconditionally check *out_hup and avoid the duplicate
code before; would that slow this down?

> +		/* Handle remote end output. */

Dual to the above.

[...]
> +/* Copy data from stdin to output and from input to stdout. */

Ah.  I think this belongs in the commit message, too. :)

Still I wonder "why".  What is the motivational example?

> +int bidirectional_transfer_loop(int input, int output)
> +{
[...]
> +	while (1) {

A typical poll loop.  Nothing scary here.

> +		int r;
> +		poll_count = load_poll_params(polls, in_buffer_use,
> +			out_buffer_use, in_hup, out_hup, in_closed, out_closed,
> +			socket_mode, input, output);
> +		if (!poll_count) {
> +			transfer_debug("Transfer done");
> +			break;

			return 0;

would avoid the reader having to scroll down, I think.

Okay, so we actually have the effect of

 sendfile(output_fd, 0, NULL, SIZE_MAX);
 sendfile(1, input_fd, NULL, SIZE_MAX);

interleaved to avoid deadlock.  In other words, this interchanges
input for output file descriptors.  The main application is remote-fd,
which needs to do this to forward input to and output from a parent
process.

For remote-ext, for a moment one might imagine one could avoid the
trouble by letting the child inherit stdin and stdout.  Unfortunately,
remote-ext needs to be able to prepend some data to its child's
input stream.  So the effect of

 sendfile(output_fd, 0, NULL, SIZE_MAX);

is still necessary (output_fd is a pipe pointing to the child's
standard input), and to time this to avoid deadlock, it still needs
to be interleaved with

 sendfile(1, input_fd, NULL, SIZE_MAX);

Hope that helps.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html