Jeff King <peff@xxxxxxxx> writes: > ... > We could do analysis on any cycles that we find to > distinguish the two cases (i.e., it is a bogus pack if and > only if every delta in the cycle is in the same pack), but > we don't need to. If there is a cycle inside a pack, we'll > run into problems not only reusing the delta, but accessing > the object data at all. So when we try to dig up the actual > size of the object, we'll hit that same cycle and kick in > our usual complain-and-try-another-source code. I agree with all of the above reasoning. > Actually, skimming the sha1_file code, I am not 100% sure that we detect > cycles in OBJ_REF_DELTA (you cannot have cycles in OBJ_OFS_DELTA since > they always point backwards in the pack). But if that is the case, then > I think we should fix that, not worry about special-casing it here. Yes, but sha1_file.c? It is the reading side and it is too late if we notice a problem, I would think. > +/* > + * Drop an on-disk delta we were planning to reuse. Naively, this would > + * just involve blanking out the "delta" field, but we have to deal > + * with two extra pieces of book-keeping: > + * > + * 1. Removing ourselves from the delta_sibling linked list. > + * > + * 2. Updating our size; check_object() will have filled in the size of our > + * delta, but a non-delta object needs it true size. Excellent point. > +/* > + * Follow the chain of deltas from this entry onward, throwing away any links > + * that cause us to hit a cycle (as determined by the DFS state flags in > + * the entries). > + */ > +static void break_delta_cycles(struct object_entry *entry) > +{ > + /* If it's not a delta, it can't be part of a cycle. */ > + if (!entry->delta) { > + entry->dfs_state = DFS_DONE; > + return; > + } > + > + switch (entry->dfs_state) { > + case DFS_NONE: > + /* > + * This is the first time we've seen the object. We mark it as > + * part of the active potential cycle and recurse. > + */ > + entry->dfs_state = DFS_ACTIVE; > + break_delta_cycles(entry->delta); > + entry->dfs_state = DFS_DONE; > + break; > + > + case DFS_DONE: > + /* object already examined, and not part of a cycle */ > + break; > + > + case DFS_ACTIVE: > + /* > + * We found a cycle that needs broken. It would be correct to > + * break any link in the chain, but it's convenient to > + * break this one. > + */ > + drop_reused_delta(entry); > + break; > + } > +} Do we need to do anything to the DFS state of an entry when drop_reused_delta() resets its other fields? If we had this and started from A (read "A--->B" as "A is based on B"): A--->B--->C--->A we paint A as ACTIVE, visit B and then C and paint them as active, and when we visit A for the second time, we drop it (i.e. break the link between A and B), return and paint C as DONE, return and paint B as DONE, and leaving A painted as ACTIVE, while the chain is now B--->C--->A If we later find D that is directly based on A, wouldn't we end up visiting A and attempt to drop it again? drop_reused_delta() is idempotent so there will be no data structure corruption, I think, but we can safely declare that the entry is now DONE after calling drop_reused_delta() on it (either in the function or in the caller after it calls the function), no? > + 2. Picking the next pack to examine based on locality (i.e., where we found > + something else recently). > + > + In this case, we want to make sure that we find the delta versions of A and > + B and not their base versions. We can do this by putting two blobs in each > + pack. The first is a "dummy" blob that can only be found in the pack in > + question. And then the second is the actual delta we want to find. > + > + The two blobs must be present in the same tree, not present in other trees, > + and the dummy pathname must sort before the delta path. > +# Create a pack containing the the tree $1 and blob $1:file, with > +# the latter stored as a delta against $2:file. > +# > +# We convince pack-objects to make the delta in the direction of our choosing > +# by marking $2 as a preferred-base edge. That results in $1:file as a thin > +# delta, and index-pack completes it by adding $2:file as a base. Tricky but clever and correct ;-) > +make_pack () { > + { > + echo "-$(git rev-parse $2)" Is everybody's 'echo' happy with dash followed by unknown string? > + echo "$(git rev-parse $1:dummy) dummy" > + echo "$(git rev-parse $1:file) file" > + } | > + git pack-objects --stdout | > + git index-pack --stdin --fix-thin An alternative git pack-objects --stdout <<-EOF | -$(git rev-parse $2) $(git rev-parse $1:dummy) dummy $(git rev-parse $1:file) file EOF git index-pack --stdin --fix-thin looks somewhat ugly, though. > +} -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html