> > diff --git a/remote-curl.c b/remote-curl.c > > index 32c133f636..13836e4c28 100644 > > --- a/remote-curl.c > > +++ b/remote-curl.c > > @@ -504,6 +504,18 @@ struct rpc_state { > > int any_written; > > unsigned gzip_request : 1; > > unsigned initial_buffer : 1; > > + > > + /* > > + * Whenever a pkt-line is read into buf, append the 4 characters > > + * denoting its length before appending the payload. > > + */ > > + unsigned write_line_lengths : 1; > > Hmm, so we read a packet, and then we "append its length" before > appending the contents. But that would always be the length we just > read, right? I wonder if it would be simpler to just call this option > something like "proxy_packets" or "full_packets", teach the packet code > to give us the full packets, and then just treat that whole buffer as a > unit. I dunno. There might be some gotchas in practice, and it's not > like it's that much simpler. Just a thought. Yes, the length is the length we just read. And it might not even be simpler - this shifts the complexity to the code that does not use the 4 characters (unless we always return 2 pointers, which seems redundant, and which now changes the problem of ensuring that the correct pointer is used). I think that this is a good default - proxying still seems rarer than just consuming payloads. I'm OK with changing the name, although I think that both "proxy" and "full" are less clear than "write_line_lengths" - aren't you still proxying even if you're changing the format a little, and isn't a packet "full" even without the line lengths? > > + /* > > + * rpc_out uses this to keep track of whether it should continue > > + * reading to populate the current request. Initialize to 0. > > + */ > > + unsigned stop_reading : 1; > > OK, so we need this because the v2 proxying will require us to stop > reading but keep the channel open? Kind of awkward, but I don't see a > way around it. I'll improve the comments here and elsewhere in the next version. This basically means that "we read a flush but we haven't sent the 0000 to libcurl yet, so don't read anything more until we have sent the 0000", and "flush_read_but_not_sent" is probably a better name. > > @@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize, > > size_t max = eltsize * nmemb; > > struct rpc_state *rpc = buffer_; > > size_t avail = rpc->len - rpc->pos; > > + enum packet_read_status status; > > > > if (!avail) { > > rpc->initial_buffer = 0; > > rpc->len = 0; > > - if (!rpc_read_from_out(rpc, &avail)) > > - BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX"); > > - if (!avail) > > - return 0; > > rpc->pos = 0; > > + if (!rpc->stop_reading) { > > + if (!rpc_read_from_out(rpc, 0, &avail, &status)) > > + BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX"); > > Do we actually need it to be LARGE_PACKET_MAX+4 here? I guess not, > because LARGE_PACKET_DATA_MAX is the "-4" version. So I think this BUG() > was perhaps already wrong? In this patch, yes (if not, the non-chunked code is useless). Previously, the BUG should have been LARGE_PACKET_DATA_MAX, yes. The BUG() was introduced in patch 4 of this set - I'll update that one to LARGE_PACKET_DATA_MAX and this one to LARGE_PACKET_MAX. > > + if (status == PACKET_READ_FLUSH) > > + /* > > + * We are done reading for this request, but we > > + * still need to send this line out (if > > + * rpc->write_line_lengths is true) so do not > > + * return yet. > > + */ > > + rpc->stop_reading = 1; > > + } > > + } > > + if (!avail && rpc->stop_reading) { > > + /* > > + * "return 0" will notify Curl that this RPC request is done, > > + * so reset stop_reading back to 0 for the next request. > > + */ > > + rpc->stop_reading = 0; > > + return 0; > > OK, and here's where we handle the stop_reading thing. It is indeed > awkward, but I think your comments make it clear what's going on. > > If we get stop_reading, do we care about "avail"? I.e., shouldn't we be > able to return non-zero to say "we got the whole input, this is not a > too-large request"? This code is in rpc_out(), which is a callback passed as CURLOPT_READFUNCTION. So returning non-zero means "send these bytes", and returning zero means EOF. We set stop_reading when we receive a flush, as you can see from the quoted code snippet. But this does not mean that the buffer is empty - the buffer may contain "0000" (if rpc->write_line_lengths is true, as the comment states). We have to let libcurl repeatedly call this function until all of the "0000" is sent (and return non-zero each time). But once all of the "0000" is sent - we know this by avail being zero - and libcurl calls this function once more, we have to remember to do nothing except to reset stop_reading and return 0 to indicate EOF. > > +test_expect_success 'clone big repository with http:// using protocol v2' ' > > + test_when_finished "rm -f log" && > > + > > + git init "$HTTPD_DOCUMENT_ROOT_PATH/big" && > > + # Ensure that the list of wants is greater than http.postbuffer below > > + for i in $(seq 1 1500) > > + do > > + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i" > > + done && > > As Junio noted, this should be test_seq. But I think it would be nice to > avoid looping on test_commit here at all. It kicks off at least 3 > processes; multiplying that by 1500 is going to be slow. > > Making a big input is often much faster by generating a fast-import > stream (which can often be done entirely in-shell). There's some prior > art in t3302, t5551, t5608, and others. OK - I'll look at generating a fast-import stream. Thanks for all your comments.