Re: [PATCH 1/7] v0 protocol: fix infinite loop when parsing multi-valued capabilities

Jeff King <peff@xxxxxxxx> · Wed, 12 Apr 2023 02:46:51 -0400

On Wed, Apr 12, 2023 at 02:29:24AM -0400, Jeff King wrote:

> But on the second call, now *offset is set to that larger index, which
> lets us skip past the first "symref" capability. However, we do so by
> incrementing feature_list. That means our pointer difference is now too
> small; it is counting from where we resumed parsing, not from the start
> of the original feature_list pointer. And because we incremented
> feature_list only inside our function, and not the caller, that
> increment is lost next time the function is called.
> 
> The simplest solution is to account for those skipped bytes by
> incrementing *offset, rather than assigning to it. (The other possible
> solution is to add an extra level of pointer indirection to feature_list
> so that the caller knows we moved it forward, but that seems more
> complicated).

Hmph. So after convincing myself that was the end of it, now I'm not so
sure. We also increment feature_list if we find a false positive in the
middle of another entry. I.e., the code even after this patch looks
like:

          while (*feature_list) {
                  const char *found = strstr(feature_list, feature);
                  if (!found)
                          return NULL;
                  if (feature_list == found || isspace(found[-1])) {
                          const char *value = found + len;
                          /* feature with no value (e.g., "thin-pack") */
                          if (!*value || isspace(*value)) {
                                  if (lenp)
                                          *lenp = 0;
                                  if (offset)
                                          *offset += found + len - feature_list;
                                  return value;
                          }
                          /* feature with a value (e.g., "agent=git/1.2.3") */
                          else if (*value == '=') {
                                  size_t end;

                                  value++;
                                  end = strcspn(value, " \t\n");
                                  if (lenp)
                                          *lenp = end;
                                  if (offset)
                                          *offset += value + end - feature_list;
                                  return value;
                          }
                          /*
                           * otherwise we matched a substring of another feature;
                           * keep looking
                           */
                  }
                  feature_list = found + 1;
          }
          return NULL;

So if we have something like:

   agent=i-like-symrefs symref=HEAD:refs/heads/foo

then we'd find the "symref" value in the agent line, increment
feature_list, and then find the real one. But our pointer difference
will again be too short! And incrementing "offset" rather than assigning
it won't help, because those skipped bytes are not accounted for in the
existing value of "offset".

So what we probably want is a third possibility I didn't allow for: keep
the original value of feature_list intact, and use a separate pointer to
increment. And then assigning "*offset = value + end - feature_list"
will always be correct, because the offset will always be from the
original, true beginning of the string.

The fix is easy, but let me see if I can come up with a test.

-Peff