On Thu, Feb 14, 2019 at 11:06:39AM -0800, Jonathan Tan wrote: > diff --git a/remote-curl.c b/remote-curl.c > index 32c133f636..13836e4c28 100644 > --- a/remote-curl.c > +++ b/remote-curl.c > @@ -504,6 +504,18 @@ struct rpc_state { > int any_written; > unsigned gzip_request : 1; > unsigned initial_buffer : 1; > + > + /* > + * Whenever a pkt-line is read into buf, append the 4 characters > + * denoting its length before appending the payload. > + */ > + unsigned write_line_lengths : 1; Hmm, so we read a packet, and then we "append its length" before appending the contents. But that would always be the length we just read, right? I wonder if it would be simpler to just call this option something like "proxy_packets" or "full_packets", teach the packet code to give us the full packets, and then just treat that whole buffer as a unit. I dunno. There might be some gotchas in practice, and it's not like it's that much simpler. Just a thought. > + /* > + * rpc_out uses this to keep track of whether it should continue > + * reading to populate the current request. Initialize to 0. > + */ > + unsigned stop_reading : 1; OK, so we need this because the v2 proxying will require us to stop reading but keep the channel open? Kind of awkward, but I don't see a way around it. > +static int rpc_read_from_out(struct rpc_state *rpc, int options, > + size_t *appended, > + enum packet_read_status *status) { > + size_t left; > + char *buf; > + int pktlen_raw; > + > + if (rpc->write_line_lengths) { > + left = rpc->alloc - rpc->len - 4; > + buf = rpc->buf + rpc->len + 4; > + } else { > + left = rpc->alloc - rpc->len; > + buf = rpc->buf + rpc->len; > + } OK, so we push the packets 4 bytes further into the buffer in that case, leaving room for the header. Makes sense. > if (left < LARGE_PACKET_MAX) > return 0; > > - *appended = packet_read(rpc->out, NULL, NULL, buf, left, 0); > - rpc->len += *appended; > + *status = packet_read_with_status(rpc->out, NULL, NULL, buf, > + left, &pktlen_raw, options); > + if (*status != PACKET_READ_EOF) { > + *appended = pktlen_raw + (rpc->write_line_lengths ? 4 : 0); > + rpc->len += *appended; > + } > + > + if (rpc->write_line_lengths) { > + switch (*status) { > + case PACKET_READ_EOF: > + if (!(options & PACKET_READ_GENTLE_ON_EOF)) > + die("shouldn't have EOF when not gentle on EOF"); > + break; > + case PACKET_READ_NORMAL: > + set_packet_header(buf - 4, *appended); > + break; > + case PACKET_READ_DELIM: > + memcpy(buf - 4, "0001", 4); > + break; > + case PACKET_READ_FLUSH: > + memcpy(buf - 4, "0000", 4); > + break; > + } > + } And here we fill it in. Make sense. It's a little awkward that we have to re-translate READ_DELIM, etc, back into their headers. > @@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize, > size_t max = eltsize * nmemb; > struct rpc_state *rpc = buffer_; > size_t avail = rpc->len - rpc->pos; > + enum packet_read_status status; > > if (!avail) { > rpc->initial_buffer = 0; > rpc->len = 0; > - if (!rpc_read_from_out(rpc, &avail)) > - BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX"); > - if (!avail) > - return 0; > rpc->pos = 0; > + if (!rpc->stop_reading) { > + if (!rpc_read_from_out(rpc, 0, &avail, &status)) > + BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX"); Do we actually need it to be LARGE_PACKET_MAX+4 here? I guess not, because LARGE_PACKET_DATA_MAX is the "-4" version. So I think this BUG() was perhaps already wrong? > + if (status == PACKET_READ_FLUSH) > + /* > + * We are done reading for this request, but we > + * still need to send this line out (if > + * rpc->write_line_lengths is true) so do not > + * return yet. > + */ > + rpc->stop_reading = 1; > + } > + } > + if (!avail && rpc->stop_reading) { > + /* > + * "return 0" will notify Curl that this RPC request is done, > + * so reset stop_reading back to 0 for the next request. > + */ > + rpc->stop_reading = 0; > + return 0; OK, and here's where we handle the stop_reading thing. It is indeed awkward, but I think your comments make it clear what's going on. If we get stop_reading, do we care about "avail"? I.e., shouldn't we be able to return non-zero to say "we got the whole input, this is not a too-large request"? > +test_expect_success 'clone big repository with http:// using protocol v2' ' > + test_when_finished "rm -f log" && > + > + git init "$HTTPD_DOCUMENT_ROOT_PATH/big" && > + # Ensure that the list of wants is greater than http.postbuffer below > + for i in $(seq 1 1500) > + do > + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i" > + done && As Junio noted, this should be test_seq. But I think it would be nice to avoid looping on test_commit here at all. It kicks off at least 3 processes; multiplying that by 1500 is going to be slow. Making a big input is often much faster by generating a fast-import stream (which can often be done entirely in-shell). There's some prior art in t3302, t5551, t5608, and others. -Peff