Re: git pack/unpack over bittorrent - works!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 4 Sep 2010, Luke Kenneth Casson Leighton wrote:

> * git-index-pack requires a pack file in order to re-create the index:
> i don't want that
> * git-pack-objects appears to have no way of telling it "just gimme
> index file please"
> * fast-import.c appears not to be what's needed either.
> 
> so - any other methods for just getting the index file (exclusively?)
> any other commands i've missed?  if not, are there any other ways of
> getting a pack's index of objects without err... getting the index
> file?  (i believe the answer to be no, but i'm just making sure) and
> on that basis i believe it is safe to ask: any objections to a patch
> which adds "--index-only" to builtin/pack-objects.c?

No patch is needed.

First, what you want is an index of objects you are willing to share, 
and not the index of whatever pack file you might have on your disk, 
especially if you have multiple packs which is typical.

Try this instead:

    git rev-list --objects HEAD | cut -c -40 | sort

That will give you a sorted list of all objects reachable from the 
current branch.  With the Linux repo, you may replace "HEAD" with 
"v2.6.34..v2.6.35" if you wish, and that would give you the list of the 
new objects that were introduced between v2.6.34 and v2.6.35.  This will 
provide you with 84642 objects instead of the 1.7 million objects that 
the Linux repo contains (easier when testing stuff).

That sorted list of objects is more or less what the pack index file 
contains, plus an offset in the pack for each entry.  It is used to 
quickly find the offset for a given object in the corresponding pack 
file, and the fanout is only a way to cut 3 iterations in the binary 
search.

But anyway, what you want is really to select the precise set of objects 
you wish to share, and not blindly using the pack index file.  If you 
have a public branch and a private branch in your repository, then 
objects from both branches may end up in the same pack and you probably 
don't want to publish those objects from the private branch. The only 
reliable way to generate a list of object is to use the output from 'git 
rev-list'.  Those objects may come from one or multiple packs, or be 
loose in the object subdirectories, or even borrowed from another 
repository through the alternates mechanism.  But rev-list will dig 
those object SHA1s for you and only those you asked for.

You should look at the Git documentation for plumbing commands.  The 
plumbing is actually a toolset that allows you to manipulate and extract 
information from a Git repository.  This is really handy for prototyping 
new functionalities. Initially, the Git user interface was all 
implemented in shell scripts on top of that plumbing.

Back to that rev-list output... OK, you want the equivalent of a fanout 
table.  You may do something like this then:

    git rev-list --objects v2.6.34..v2.6.35 | cut -c -2 | sort | uniq -c

And so on.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]