On Wed, Apr 03, 2024 at 08:30:54AM +0200, Patrick Steinhardt wrote: > On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote: > > In get_active_slot(), we return a CURL handle that may have been used > > before (reusing them is good because it lets curl reuse the same > > connection across many requests). We set a few curl options back to > > defaults that may have been modified by previous requests. > > > > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which > > defaults to "-1"). This usually doesn't matter because most POSTs will > > set both fields together anyway. But there is one exception: when > > handling a large request in remote-curl's post_rpc(), we don't set > > _either_, and instead set a READFUNCTION to stream data into libcurl. > > > > This can interact weirdly with a stale POSTFIELDSIZE setting, because > > curl will assume it should read only some set number of bytes from our > > READFUNCTION. However, it has worked in practice because we also > > manually set a "Transfer-Encoding: chunked" header, which libcurl uses > > as a clue to set the POSTFIELDSIZE to -1 itself. > > > > So everything works, but we're better off resetting the size manually > > for a few reasons: > > > > - there was a regression in curl 8.7.0 where the chunked header > > detection didn't kick in, causing any large HTTP requests made by > > Git to fail. This has since been fixed (but not yet released). In > > the issue, curl folks recommended setting it explicitly to -1: > > > > https://github.com/curl/curl/issues/13229#issuecomment-2029826058 > > > > and it indeed works around the regression. So even though it won't > > be strictly necessary after the fix there, this will help folks who > > end up using the affected libcurl versions. > > > > - it's consistent with what a new curl handle would look like. Since > > get_active_slot() may or may not return a used handle, this reduces > > the possibility of heisenbugs that only appear with certain request > > patterns. > > > > Note that the recommendation in the curl issue is to actually drop the > > manual Transfer-Encoding header. Modern libcurl will add the header > > itself when streaming from a READFUNCTION. However, that code wasn't > > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST > > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to > > support back to 7.19.5, so those older versions still need the manual > > header. > > > > Signed-off-by: Jeff King <peff@xxxxxxxx> > > --- > > http.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/http.c b/http.c > > index e73b136e58..3d80bd6116 100644 > > --- a/http.c > > +++ b/http.c > > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void) > > curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > > curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > > curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > > + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L); > > curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > > curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); > > Can't we refactor this code to instead use `curl_easy_reset()`? That > function already resets most of the data we want to reset and would also > end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So > wouldn't the following be a more sensible fix? > > diff --git a/http.c b/http.c > index e73b136e58..e5f5bc23db 100644 > --- a/http.c > +++ b/http.c > @@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void) > slot->finished = NULL; > slot->callback_data = NULL; > slot->callback_func = NULL; > + curl_easy_reset(slot->curl); > curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file); > if (curl_save_cookies) > curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file); > curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header); > curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions); > curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr); > - curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > - curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); > - curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL); > > /* > * Default following to off unless "ALWAYS" is configured; this gives Oh well, the answer is "no", or at least not as easily as this, as the failing tests tell us. I guess it resets more data than we actually want it to reset, but I didn't dig any deeper than that. Patrick
Attachment:
signature.asc
Description: PGP signature