Re: clones over rsync broken?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 30, 2016 at 05:11:33AM +0000, Eric Wong wrote:

> I have not used rsync remotes in ages, but I was working on the
> patch for -4/-6 support and decided to test it against rsync.kernel.org
> 
> Cloning git.git takes forever and failed with:

No kidding. There are over 95,000 unreachable loose objects consuming a
gigabyte. The rsync transport blindly pulls all of the data over, with
no idea that it doesn't need most of it.

> $ git clone rsync://rsync.kernel.org/pub/scm/git/git.git
> Checking connectivity... fatal: bad object ecdc6d8612df80e871ed34bb6c3b01b20b0b82e6
> fatal: remote did not send all necessary objects

All those objects, and we still manage to miss one. :)

Interestingly, that object does not seem to exist at all on the remote!
I think this is the same bug as the one below. Read on...

> However, trying to clone a smaller repo like pahole.git via rsync fails
> differently; this looks more like a git bug:
> 
> $ git clone rsync://rsync.kernel.org/pub/scm/devel/pahole/pahole.git
> fatal: Multiple updates for ref 'refs/remotes/origin/master' not allowed.
> 
> Using rsync(1) manually to grab pahole.git and inspecting the bare
> repo with yields no anomalies with "git fsck --full".
> $GIT_DIR/info/refs and $GIT_DIR/packed-refs both look fine, but
> perhaps it's confused by the existence of $GIT_DIR/refs/heads/master
> as a loose ref?

Yes, that's exactly what's going on. In get_refs_via_rsync, we blindly
concatenate the list of loose refs and packed refs. But that's not
right, and never has been. If the same ref exists in both stores, the
loose ref takes precedence (that is how we can write new refs without
having to rewrite the whole packed-refs file).

So we erroneously believe that refs/heads/master exists _twice_ on the
remote, with two different values (and try to store it twice as
refs/remotes/origin/master). But we should be accepting only the loose
value.

This explains the git.git problem, too. There are two entries for
refs/heads/pu: one loose and one in packed-refs. The latter is a stale,
older value, and should never be looked at. But because pu gets rewound,
its older values are not necessarily reachable and may even have been
pruned!

So no, we do not have ecdc6d86, but neither does the upstream, and
nothing is referencing it.

It looks like this has been broken since cd547b4 (fetch/push: readd
rsync support, 2007-10-01). The fix is just to ignore packed-refs
entries which duplicate loose ones. But given the length of time this
has been broken with nobody complaining, I have to wonder if it is
simply time to retire the rsync protocol. Even if was made to work, it
is a horribly inefficient protocol.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]