Le mardi 13 janvier 2009, Junio C Hamano a écrit : > Christian Couder <chriscool@xxxxxxxxxxxxx> writes: > > diff --git a/replace_object.c b/replace_object.c > > new file mode 100644 > > index 0000000..25b3ef3 > > --- /dev/null > > +++ b/replace_object.c > > @@ -0,0 +1,116 @@ > > +#include "cache.h" > > +#include "refs.h" > > + > > +static struct replace_object { > > + unsigned char sha1[2][20]; > > +} **replace_object; > > + > > +static int replace_object_alloc, replace_object_nr; > > + > > +static int replace_object_pos(const unsigned char *sha1) > > +{ > > + int lo, hi; > > + lo = 0; > > + hi = replace_object_nr; > > + while (lo < hi) { > > + int mi = (lo + hi) / 2; > > + struct replace_object *rep = replace_object[mi]; > > + int cmp = hashcmp(sha1, rep->sha1[0]); > > + if (!cmp) > > + return mi; > > + if (cmp < 0) > > + hi = mi; > > + else > > + lo = mi + 1; > > + } > > + return -lo - 1; > > +} > > Hmm, this is a tangent of this topic, but I wonder if we can do something > more generic to factor out many binary search like this we have > throughout the code. Also I wonder if they can be made more efficient by > taking advantage of the fact that our hash is expected to produce a good > uniform distribution, similar to the way patch-ids.c::patch_pos() does > this. > > I guess the performance should not matter much for this table, as we > expect there won't be massive object replacements. > > Also I recall Dscho muttered something about hashmap I didn't quite > understand. Yeah, maybe it's possible to get faster code and to refactor the many binary search we have, but I will leave that for latter or for other people interested in these topics, if you let me. > > +static int register_replace_ref(const char *refname, > > + const unsigned char *sha1, > > + int flag, void *cb_data) > > +{ > > + /* Get sha1 from refname */ > > + const char *slash = strrchr(refname, '/'); > > + const char *hash = slash ? slash + 1 : refname; > > + struct replace_object * repl_obj = xmalloc(sizeof(*repl_obj)); > > struct replace_object *repl_obj = ... Ok. > > + if (strlen(hash) != 40 || get_sha1_hex(hash, repl_obj->sha1[0])) { > > + free(repl_obj); > > + warning("bad replace ref name: %s", refname); > > + } > > + > > + /* Copy sha1 from the read ref */ > > + hashcpy(repl_obj->sha1[1], sha1); > > Upon an error, you free and warn and then still copy into it? Ooops, I forgot a "return 0;" statement after the warning. > > + /* Register new object */ > > + if (register_replace_object(repl_obj, 1)) > > + warning("duplicate replace ref: %s", refname); > > I'd say this is a grave error and should be reported as a repository > corruption. If we let people have a set of replace refs in "refs/replace/" and another one in "refs/replace/bisect/", and the latter one is used only when bisecting, then it may happen that the same commit has one ref in "refs/replace/" and another one in "refs/replace/bisect/". In this case it should probably be considered as something we should not even warn on, and the replace ref in "refs/replace/bisect/" should be used when bisecting. But, as we don't have a mechanism to do that yet, you are right, we should probably "die" for now here. > > +static void prepare_replace_object(void) > > +{ > > + static int replace_object_prepared; > > + > > + if (replace_object_prepared) > > + return; > > + > > + for_each_replace_ref(register_replace_ref, NULL); > > + replace_object_prepared = 1; > > +} > > + > > +/* We allow "recursive" replacement. Only within reason, though */ > > +#define MAXREPLACEDEPTH 5 > > + > > +const unsigned char *lookup_replace_object(const unsigned char *sha1) > > +{ > > + int pos, depth = MAXREPLACEDEPTH; > > + const unsigned char *cur = sha1; > > + > > + prepare_replace_object(); > > + > > + /* Try to recursively replace the object */ > > + do { > > + if (--depth < 0) > > + die("replace depth too high for object %s", > > + sha1_to_hex(sha1)); > > + > > + pos = replace_object_pos(cur); > > + if (0 <= pos) > > + cur = replace_object[pos]->sha1[1]; > > + } while (0 <= pos); > > + > > + return cur; > > +} > > Since your paradigm is prepare replacement once at the beginning, never > allowing to update it, I think you can update the table while you look it > up. E.g. if A->B and B->C exists, and A is asked for, you find A->B (to > tentatively make cur to point at B) and then you find B->C, and before > returning you can rewrite the first mapping to A->C. Later look-up won't > need to dereference the table twice that way. > > This assumes that there will be small number of replacements, but the > same object can be asked for more than once during the process. If we allow different sets of replace refs, for example A->B in "refs/replace/" and B->C in "refs/replace/bisect/", then we cannot rewrite as you suggest. We could add A->C in "refs/replace/bisect/", so that it overcomes A->B and B->C when we bisect, but we would not gain much. So I prefer not to do that right now. Maybe later if we decide we really don't want to allow different sets of replace refs, we can do what you suggest. > > diff --git a/sha1_file.c b/sha1_file.c > > index f08493f..4f2fd10 100644 > > --- a/sha1_file.c > > +++ b/sha1_file.c > > @@ -2163,10 +2163,18 @@ static void *read_object(const unsigned char > > *sha1, enum object_type *type, void *read_sha1_file(const unsigned char > > *sha1, enum object_type *type, unsigned long *size) > > { > > - void *data = read_object(sha1, type, size); > > + const unsigned char *repl = lookup_replace_object(sha1); > > + void *data = read_object(repl, type, size); > > + > > + /* die if we replaced an object with one that does not exist */ > > + if (!data && repl != sha1) > > + die("replacement %s not found for %s", > > + sha1_to_hex(repl), sha1_to_hex(sha1)); > > + > > /* legacy behavior is to die on corrupted objects */ > > - if (!data && (has_loose_object(sha1) || has_packed_and_bad(sha1))) > > - die("object %s is corrupted", sha1_to_hex(sha1)); > > + if (!data && (has_loose_object(repl) || has_packed_and_bad(repl))) > > + die("object %s is corrupted", sha1_to_hex(repl)); > > + > > return data; > > } > > Later we'd need a global switch to forbid the replacement for > connectivity walkers, Yeah, I am slowly working on it. My next series (hopefully in a few days) where the above errors are fixed will include it. > but other than that I think this is sane. > > I also looked at 1/ and 2/ which looked Ok. Thanks, Christian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html