Re: [PATCH] fix simple deepening of a repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre <nico@xxxxxxx> wrote:
> Well... Johan Herland says he has to deal with repositories containing 
> around 50000 refs.  So in that case it is certainly a good idea not to 
> send the whole 50000 refs back if only one or two (or a hundred) need to 
> be updated.  And quickfetch() won't help in that case since its purpose 
> is only to determine if there is anything at all to update.
...
> 50000 refs * 45 bytes each = 2.25 MB.  That's all wasted bandwidth if 
> only one ref needs updating.

Not just Johan Herland.  Gerrit Code Review creates a new ref
for every patch proposed for review.  Imagine taking every email
message on git ML that has "[PATCH]" in the subject, and creating
a new ref for that in a git.git clone.

We aren't quite at the 50k ref stage yet, but we're starting to
consider that some of our repositories have a ton of refs, and
that the initial advertisement for either fetch or push is horrid.

Since the refs are immutable I could actually teach the JGit
daemon to hide them from JGit's receive-pack, thus cutting down the
advertisement on push, but the refs exist so you can literally say:

  git fetch URL refs/changes/88/4488/2
  git show FETCH_HEAD

to inspect the "v2" version of whatever 4488 is, and if 4488 was
the last commit in a patch series, you'd also be able to do:

  git log -p --reverse ..FETCH_HEAD

to see the complete series.

Given how infrequent it is to grab a given change is though, I'm
starting to consider either a protocol extension that allows the
client to probe for a ref which wasn't in the initial advertisement,
or take it on a command line flag, e.g.:

  git fetch --uploadpack='git upload-pack --ref refs/changes/88/4488/2' URL refs/changes/88/4488/2

Personally I'd prefer extending the protocol, because making the
end user supply information twice is stupid.

I don't know enough about Johan's case though to know whether or
not he can get away with hiding the bulk of the refs in the initial
advertisement.  In the case of Gerrit Code Review, the bulk of the
refs is under refs/changes/, only a handful of things are under the
refs/heads/ and ref/tags/ namespace, and most fetches actually are
for only refs/heads/ and refs/tags/.  So hiding the refs/changes/
namespace would make large improvement in the advertisement cost.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]