Re: git pack/unpack over bittorrent - works!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 5, 2010 at 3:16 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> On Sat, 4 Sep 2010, Luke Kenneth Casson Leighton wrote:
>
>> * git-index-pack requires a pack file in order to re-create the index:
>> i don't want that
>> * git-pack-objects appears to have no way of telling it "just gimme
>> index file please"
>> * fast-import.c appears not to be what's needed either.
>>
>> so - any other methods for just getting the index file (exclusively?)
>> any other commands i've missed?  if not, are there any other ways of
>> getting a pack's index of objects without err... getting the index
>> file?  (i believe the answer to be no, but i'm just making sure) and
>> on that basis i believe it is safe to ask: any objections to a patch
>> which adds "--index-only" to builtin/pack-objects.c?
>
> No patch is needed.
>
> First, what you want is an index of objects you are willing to share,
> and not the index of whatever pack file you might have on your disk,
> especially if you have multiple packs which is typical.

 blast.  so *sigh* ignoring the benefits that can be obtained by the
delta-compression thing, somewhat; ignoring the fact that perhaps less
traffic miight be transferred by happening to borrow objects from
another branch (which is the situation that, i believe, happens with
"git pull" over http:// or git://); ignoring the fact that i actually
implemented using the .idx file yesterday ... :)

 ... there is a bit of a disadvantage to using pack index files that
it goes all the way down (if i am reading things correctly) and cannot
be told "give me just the objects related to a particular commit"....


> Try this instead:
>
>    git rev-list --objects HEAD | cut -c -40 | sort
>
> That will give you a sorted list of all objects reachable from the
> current branch.  With the Linux repo, you may replace "HEAD" with
> "v2.6.34..v2.6.35" if you wish, and that would give you the list of the
> new objects that were introduced between v2.6.34 and v2.6.35.

 ... unlike this, which is in fact much more along the lines of what i
was looking for (minus the loveliness of the delta compression oh
well)

> This will
> provide you with 84642 objects instead of the 1.7 million objects that
> the Linux repo contains (easier when testing stuff).

 hurrah! :)  [but, then if you actually want to go back and get alll
commits, that's ... well, we'll not worry about that too much, given
the benefits of being able to get smaller chunks.]

> That sorted list of objects is more or less what the pack index file
> contains, plus an offset in the pack for each entry.  It is used to
> quickly find the offset for a given object in the corresponding pack
> file, and the fanout is only a way to cut 3 iterations in the binary
> search.
>
> But anyway, what you want is really to select the precise set of objects
> you wish to share, and not blindly using the pack index file.  If you
> have a public branch and a private branch in your repository, then
> objects from both branches may end up in the same pack

 slightly confused: are you of the belief that i intend to ignore
refs/branches/* starting points?

> and you probably
> don't want to publish those objects from the private branch.

 ahh, i wondered where i'd seen the bit about "confusing" two
branches, i thought it was in another message.  so many flying back &
forth :)  from what i can gather, this is exactly what happens with
git fetch from http:// or git:// so what's the big deal about that?
why stop gitp2p from benefitting from the extra compression that could
result from "borrowing" bits of another branch's objects, neh?

 or .. have i misunderstood?

> The only
> reliable way to generate a list of object is to use the output from 'git
> rev-list'.  Those objects may come from one or multiple packs, or be
> loose in the object subdirectories, or even borrowed from another
> repository through the alternates mechanism.  But rev-list will dig
> those object SHA1s for you and only those you asked for.

 excellent.  that's proobably what i need right now.

> You should look at the Git documentation for plumbing commands.  The
> plumbing is actually a toolset that allows you to manipulate and extract
> information from a Git repository.  This is really handy for prototyping
> new functionalities. Initially, the Git user interface was all
> implemented in shell scripts on top of that plumbing.

 i'm using gitdb (ok don't need that any more, if i don't walk the
pack-index file *sigh*) and python-git - am quite happy with the speed
at which i can knock stuff together, using it.  the only tricky wobbly
moment i had was not being able to pass in a file-handle to stdin (git
pack-objects) and i got round that with "input = os.tmpfile();
input.write(objref+"\n"); input.seek(0)".

> Back to that rev-list output... OK, you want the equivalent of a fanout
> table.  You may do something like this then:
>
>    git rev-list --objects v2.6.34..v2.6.35 | cut -c -2 | sort | uniq -c

  ack.  got it.

 thanks nicolas.

l.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]