git-bundle rev handling and de-duping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[subject tweaked as we have veered quite far off the original, and
 this might get more attention from interested people]

On Thu, Oct 16, 2014 at 10:39:54AM -0700, Junio C Hamano wrote:

>  2.  We use object_array_remove_duplicates() to de-dup "git bundle
>      create x master master", which came from b2a6d1c6 (bundle:
>      allow the same ref to be given more than once, 2009-01-17),
>      which is still the sole caller of the function, and I think
>      this is bogus.  Comparing .name would not de-dup "git bundle
>      create x master refs/heads/master".

Hmm. We also doesn't respect the UNINTERESTING flag, so it will dedup
"foo" and "^foo" into a single entry. I _think_ this doesn't have any
bad side effects, because they are equivalent (i.e., the flag is carried
on the object, not the pending entry). I would expect this:

  git bundle create ... foo ^foo

to barf (and it does) because the bundle is empty. But with this:

  git bundle create ... another-ref foo ^foo

I would expect the resulting bundle to contain the objects from
another-ref, but still mention "foo" as an update (we didn't need to
send any objects, since presumably the other side already had them). It
doesn't, but that is not because of the dedup. It is because we generate
the list of refs to send based on the pending list, and we skip all
UNINTERESTING objects (we must, because otherwise "^foo" would make an
entry). I.e., it is the same "the flag is on the object, not the pending
entry" that saved us above now causing its own bug. Obviously the
example above is sort of silly, but if you have:

  # two branches point at the same object
  git branch foo master

  # the other side already has master. Let's send them foo.
  # this will fail because the bundle is empty. That's mildly
  # annoying because we really want to tell them "hey, update
  # your foo branch". But at least we get an error.
  git bundle create tmp.bundle foo ^master

  # now the same thing, but we're sending them some other objects.
  # This one succeeds, but silently omits foo from the bundle!
  git bundle create tmp.bundle foo another-ref ^master

I have a feeling that the list needs to be generated from revs.cmdline,
not revs.pending, and the de-duplication needs to happen there (with the
ref resolution that you mention).

I also have the feeling that fast-export had to deal with this exact
same issue. It looks like we use revs.cmdline there. I seem to recall
there was some ongoing work in that area, but I stopped paying close
attention due to some personality conflicts, and I don't know if all of
the issues were ever resolved.

> I think the right way to fix these two and a half problems is to do
> the following:
> 
>  - object_array_remove_duplicates() (and contains_name() helper it
>    uses) should be removed from object.c;
> 
>  - create_bundle() in bundle.c should implement a helper that is
>    similar to contains_name() but knows about ref dwimming and use
>    it to call object_array_filter() to replace its call to
>    object_array_remove_duplicates().

Agreed. The loop in create_bundle right after the de-dup does the
dwimming. Probably it would be simple to just skip duplicates there
using a hashmap or sorted list.

> I am not doing this myself, and I do not expect either you or
> Michael to do so, either.  I am just writing this down to point out
> a low hanging fruit to aspiring new contributors (hint, hint).

I am also not planning on working on it soon, but now we have hopefully
fed plenty of possibilities to anybody who wants to. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]