Re: [PATCH] promisor-remote: skip move_to_tail when n=1

Emily Shaffer <emilyshaffer@xxxxxxxxxx> · Thu, 26 Sep 2019 10:53:08 -0700

On Thu, Sep 26, 2019 at 03:55:35AM -0400, Jeff King wrote:
> On Wed, Sep 25, 2019 at 02:37:18PM -0700, Emily Shaffer wrote:
> 
> > Previously, when promisor_remote_move_to_tail() is called for a
> > promisor_remote which is currently the *only* element in promisors, a
> > cycle is created in the promisors linked list. This cycle leads to a
> > double free later on in promisor_remote_clear(): promisors is set to
> > promisors->next (a no-op, as promisors->next == promisors); the previous
> > value of promisors is free()'d; then the new value of promisors (which
> > is equal to the previous value of promisors) is also free()'d. This
> > double-free error was unrecoverable for the user without removing the
> > filter or re-cloning the repo and hoping to miss this edge case.
> 
> Nice clear description.
> 
> Is the problem only when the remote is the only element of the list, or
> would it also happen when it's at the end?

It totally does. Nice catch, I wasn't seeing past my specific example.
:)

> 
> I think the problematic lines are:
> 
>   r->next = NULL;
>   *promisors_tail = r;
> 
> If our "r" is already the tail, then promisors_tail points to &r->next,
> and we create a cycle in the linked list.
> 
> I think if you swap those two lines, the problem goes away. That said,
> it's subtle enough that I think covering the noop case at the start of
> the function is much clearer.
> 
> Using that strategy, I think this:
> 
> > +	if (promisors == r && promisors->next == NULL)
> > +		return;
> 
> should probably just see if we're already at the end, which also covers
> the single-element case. Like:
> 
>   if (!r->next)
> 	return; /* we're already at the end */

Hmm, I guess I wasn't familiar enough on the lifetime of a
promisor_remote - I suppose I was expecting
promisor_remote_move_to_tail() could be used for a first-time insert,
too, although it looks like promisor_remote_new() actually does the
insert for us every time.

> 
> or possibly:
> 
>   if (promisors_tail == &r->next)
> 	return; /* we're already at the end */

With the above concern I initially feel a little more comfortable with
this, although now that I'm thinking through the case when 'r' isn't
already in the list, I think it would replace the entire list by taking
the 'else' branch, having a nulled r->next, and therefore replacing the
head pointer 'promisors' with itself.

So, considering that in the former approach incorrect usage (I calloc my
own promisor_remote and call move_to_tail() on it) is a no-op, and in
the latter approach the same incorrect use case is disastrous (we leak
any promisor_remotes already in the list), I'll stick with the former.

Thanks for the suggestions and for pointing out my edge case was wider
than I thought!

> 
> I also can't help but think this would all be a lot simpler using the
> implementation in list.h. Then we don't have to pass this weird
> "previous" pointer around (because it's a doubly-linked list). And
> functions like this one could go away in favor of list_move(). But
> that's obviously a bigger change.

Agreed. I joked to my team that this was the first time I've needed to
understand linked list manipulation outside of an interview setting,
ever ;)

> 
> > This change showed up for us in a user bugreport; I'm actually fairly
> > unfamiliar with the codebase here but given the drastic nature of the
> > failure, I wanted to get a fix up quickly. I'm still working on how to
> > reproduce this exact case in the test suite (and actually would
> > appreciate any pointers). Specifically, it looks like we only really
> > break if we have a single promisor_remote in the linked list, call
> > move_to_tail() on it at least once, and then call clear() on it without
> > adding another promisor_remote first.
> 
> The only call is in promisor_remote_init(), where we try to move
> whatever promisor we get from looking up repository_format_partial_clone.
> That name in turn comes from the extensions.partialclone config.
> 
> The init function also creates elements in the list from any remotes
> marked with remote.XYZ.promisor in the config.
> 
> So this is enough to create the cycle in the linked list:
> 
>   git init
>   git remote add foo /wherever
>   git config remote.foo.promisor true
>   git config extensions.partialclone foo
>   git fetch foo
> 
> but it doesn't trigger the double-free because we don't ever free
> anything. We only call the clear() function during reinit(). And that
> only happens in partial_clone_register(). So if we take the commands
> above, and then try to actually do a partial clone with it (causing it
> to re-register), I get (building with ASan, but vanilla glibc seems to
> detect the double-free too):
> 
>   $ git fetch --filter=blob:none foo
>   ==30356==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000001f90 at pc 0x559adb9a0919 bp 0x7ffeb7a52b30 sp 0x7ffeb7a52b28
>   READ of size 8 at 0x603000001f90 thread T0
>     #0 0x559adb9a0918 in promisor_remote_clear /home/peff/compile/git/promisor-remote.c:173
>     #1 0x559adb9a0918 in promisor_remote_reinit /home/peff/compile/git/promisor-remote.c:183
>     #2 0x559adb5856c0 in fetch_one_setup_partial builtin/fetch.c:1570
>     #3 0x559adb5856c0 in cmd_fetch builtin/fetch.c:1736
>     [...etc...]
> 
> Another way to manifest the error:
> 
>   git remote add bar /does-not-matter
>   git fetch --filter=blob:none bar
> 
> Now we'll try to register "bar". But since it's _not_ already in the
> linked list, our search will go on forever, since the list loops back on
> itself.

I really appreciate your coming up with these cases. I'll test-ify them
and send another patch, plus the modified fix as discussed as above.

 - Emily