On Sun, 6 Aug 2006 00:08:48 -0400 Shawn Pearce wrote: > Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > > On 8/5/06, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > > >Jon Smirl <jonsmirl@xxxxxxxxx> wrote: [...] > > >> Why does resolve_delta in index-pack.c need to be recursive? Is there > > >> a better way to code that routine? If it mmaps the file that uses 1GB > > >> address space, why does it need another 1.5GB to build an index? > > > > > >Probably the easiest way to code the routine. Delta depth is > > >bounded; in the fast-import.c that I sent out last night I hardcoded > > >it to 10, which is (I believe) the default for GIT. So long as that > > >routine is recursive only along a single delta chain the recursion > > >depth won't be very high and shouldn't be the problem. > > > > When I put index-pack in gdb at the seg fault, resolve_delta had > > recursed more than 20,000 times. I stopped looking after that. > > Ouch. I'm not familiar with this code, but looking at right now its > also not entirely obviously what its recursing for. Plus dinner > is trying to be burned on the grill, so my attention is on that > more than on GIT. :-) git-pack-objects never creates a pack file with duplicate objects, therefore git-index-pack was never tested on such pack files - no wonder that it breaks. The case of patch revert (A -> B -> A again) is probably the problem - your dumb pack generator will probably write this: A B (delta based on A) A (delta based on B) git-index-pack will first unpack the first copy of A, then notice that A is used as a delta base for B and apply the delta, then it will find the second copy of A and apply that delta, and then it will find B again... Please try the patch at the end of this message - it should help to avoid the infinite recursion in git-index-pack. However, I'm not sure that other git parts won't do something bad when they encounter an index with duplicate sha1 entries (and git-index-pack cannot remove these duplicates, because the number of index entries must match the pack header). > > >> I had a prior 400MB pack file built with fast-import that I was able > > >> to index ok. > > > > > >Dumb luck? Maybe that had no duplicates while this one does? > > > > Is there a git command to list the sha1's in a pack that doesn't have > > an index? I could sort it, sort it unqiue, and then diff the outputs. git-index-pack :) (git-unpack-objects will also have all sha1 values at the end, but the side effect of unpacking all objects to separate files is probably not what you want to get.) > Not that I know of. Packs themselves don't have the SHA1 values and > getting them from a pack without an index is a painful exercise as > you don't know where the base of an object resides within the pack > when you need it to generate the object's raw content to determine > its ID. Yes, this is a problem. git-unpack-objects can unpack all objects in a single pass, but only because it temporarily saves all deltas which cannot be resolved yet, and can read back objects which it has written before. git-index-pack needs to work without modifying the object database, so it works in two passes: /* * First pass: * - find locations of all objects; * - calculate SHA1 of all non-delta objects; * - remember base SHA1 for all deltas. */ /* * Second pass: * - for all non-delta objects, look if it is used as a base for * deltas; * - if used as a base, uncompress the object and apply all deltas, * recursively checking if the resulting object is used as a base * for some more deltas. */ ----------------------------------------------------------------------- From: Sergey Vlasov <vsu@xxxxxxxxxxx> Date: Sun, 6 Aug 2006 17:28:29 +0400 Subject: [PATCH] git-index-pack: Avoid infinite recursion if the pack has duplicate objects Although git-pack-objects never creates packs which contain the same object more than once, other tools may be not so careful, so add a check to prevents infinite recursion of resolve_delta() for such packs. Signed-off-by: Sergey Vlasov <vsu@xxxxxxxxxxx> --- index-pack.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/index-pack.c b/index-pack.c index b39953d..a8e3b1f 100644 --- a/index-pack.c +++ b/index-pack.c @@ -257,6 +257,13 @@ static void resolve_delta(struct delta_e unsigned long next_obj_offset; int j, first, last; + /* + * Do nothing if this delta was resolved earlier (this can happen if + * the pack contains duplicate objects for some reason). + */ + if (obj->real_type != OBJ_DELTA) + return; + obj->real_type = type; delta_data = unpack_raw_entry(obj->offset, &delta_type, &delta_size, base_sha1, -- 1.4.2.rc3.g23aa
Attachment:
pgpbep3OdJ6iK.pgp
Description: PGP signature