Re: Why does send-pack call pack-objects for all remote refs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Daniel Koverman <dkoverman@xxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

> I have a repository which has ~2000 branches on the remote, and it
> takes ~8 seconds to push a change to one ref. The majority of this
> time is spent in pack-object. I wrote a hack so that only the ref
> being updated would be packed (the normal behavior is to pack for
> every ref on the remote).

I am having a hard time understanding what you are trying to say, as
nobody's pack-objects "packs for a ref" or "packs a ref", so my
response has to be based on my best guess---I think you are talking
about feeding the object names of the tips of all remote refs as
the bottoms of the revision range to pack-objects.

When you are pushing your 'topic' branch to update the 'topic'
branch at the remote, it is true that we compute

	git rev-list --objects $your_topic --not $all_of_the_remote_refs

to produce a packfile.  And by tweaking this to

	git rev-list --objects $your_topic --not $their_topic

you will cut down the processing time of 'rev-list', especially if
you have insane number of refs at the remote end.

There is a price you would pay for doing so, though.  An obvious one
is what if the 'topic' branch does not exist yet at the remote.
Without the "--not ..." part, you would end up sending the entire
history behind $your_topic, and the way you prevent that from
happening is to give what are known to exist at the remote end.
Even when there already is 'topic' at the remote, the contents at
the paths that are different between your 'topic' and the 'topic' as
exists at the remote may already exist on some other branches that
are already at the remote (e.g. you may have merged some branches
that are common between your repository and the remote, and the only
object missing from the remote that your repository has to send may
be a merge commit and the top-level tree object), but limiting the
bottoms of the revision range only to "--not $their_topic" would rob
this obvious optimization opportunity from you.

There has to be some way to limit the list of remote-refs that are
used as bottoms of the revision range.  For example, if you know
that the remote has all the tags, and that everything in the v1.0
tag is contained in the v2.0 tag, then a single "--not v2.0" should
give the same result as "--not v1.0 v2.0" that lists both.  But the
computation that is needed to figure out which tags and branches are
not worth listing as bottoms would need to look at all of them at
least once anyway, so a naive implementation of such would end up
spending the same cycles, I would suspect.

Also it was unclear if you are working with a shallow repository.
The performance trade-off made between the packsize and the cycles
is somewhat different between a normal and a shallow repository,
e.g. 2dacf26d (pack-objects: use --objects-edge-aggressive for
shallow repos, 2014-12-24) might be a good starting point to think
about this issue.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]