On Sun, Sep 5, 2010 at 3:16 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Sat, 4 Sep 2010, Luke Kenneth Casson Leighton wrote: > >> * git-index-pack requires a pack file in order to re-create the index: >> i don't want that >> * git-pack-objects appears to have no way of telling it "just gimme >> index file please" >> * fast-import.c appears not to be what's needed either. >> >> so - any other methods for just getting the index file (exclusively?) >> any other commands i've missed? if not, are there any other ways of >> getting a pack's index of objects without err... getting the index >> file? (i believe the answer to be no, but i'm just making sure) and >> on that basis i believe it is safe to ask: any objections to a patch >> which adds "--index-only" to builtin/pack-objects.c? > > No patch is needed. > > First, what you want is an index of objects you are willing to share, > and not the index of whatever pack file you might have on your disk, > especially if you have multiple packs which is typical. blast. so *sigh* ignoring the benefits that can be obtained by the delta-compression thing, somewhat; ignoring the fact that perhaps less traffic miight be transferred by happening to borrow objects from another branch (which is the situation that, i believe, happens with "git pull" over http:// or git://); ignoring the fact that i actually implemented using the .idx file yesterday ... :) ... there is a bit of a disadvantage to using pack index files that it goes all the way down (if i am reading things correctly) and cannot be told "give me just the objects related to a particular commit".... > Try this instead: > > git rev-list --objects HEAD | cut -c -40 | sort > > That will give you a sorted list of all objects reachable from the > current branch. With the Linux repo, you may replace "HEAD" with > "v2.6.34..v2.6.35" if you wish, and that would give you the list of the > new objects that were introduced between v2.6.34 and v2.6.35. ... unlike this, which is in fact much more along the lines of what i was looking for (minus the loveliness of the delta compression oh well) > This will > provide you with 84642 objects instead of the 1.7 million objects that > the Linux repo contains (easier when testing stuff). hurrah! :) [but, then if you actually want to go back and get alll commits, that's ... well, we'll not worry about that too much, given the benefits of being able to get smaller chunks.] > That sorted list of objects is more or less what the pack index file > contains, plus an offset in the pack for each entry. It is used to > quickly find the offset for a given object in the corresponding pack > file, and the fanout is only a way to cut 3 iterations in the binary > search. > > But anyway, what you want is really to select the precise set of objects > you wish to share, and not blindly using the pack index file. If you > have a public branch and a private branch in your repository, then > objects from both branches may end up in the same pack slightly confused: are you of the belief that i intend to ignore refs/branches/* starting points? > and you probably > don't want to publish those objects from the private branch. ahh, i wondered where i'd seen the bit about "confusing" two branches, i thought it was in another message. so many flying back & forth :) from what i can gather, this is exactly what happens with git fetch from http:// or git:// so what's the big deal about that? why stop gitp2p from benefitting from the extra compression that could result from "borrowing" bits of another branch's objects, neh? or .. have i misunderstood? > The only > reliable way to generate a list of object is to use the output from 'git > rev-list'. Those objects may come from one or multiple packs, or be > loose in the object subdirectories, or even borrowed from another > repository through the alternates mechanism. But rev-list will dig > those object SHA1s for you and only those you asked for. excellent. that's proobably what i need right now. > You should look at the Git documentation for plumbing commands. The > plumbing is actually a toolset that allows you to manipulate and extract > information from a Git repository. This is really handy for prototyping > new functionalities. Initially, the Git user interface was all > implemented in shell scripts on top of that plumbing. i'm using gitdb (ok don't need that any more, if i don't walk the pack-index file *sigh*) and python-git - am quite happy with the speed at which i can knock stuff together, using it. the only tricky wobbly moment i had was not being able to pass in a file-handle to stdin (git pack-objects) and i got round that with "input = os.tmpfile(); input.write(objref+"\n"); input.seek(0)". > Back to that rev-list output... OK, you want the equivalent of a fanout > table. You may do something like this then: > > git rev-list --objects v2.6.34..v2.6.35 | cut -c -2 | sort | uniq -c ack. got it. thanks nicolas. l. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html