Re: [RFC/PATCH 3/7] replace_object: add mechanism to replace objects found in "refs/replace/"

Christian Couder <chriscool@xxxxxxxxxxxxx> · Thu, 15 Jan 2009 10:44:45 +0100

Le mardi 13 janvier 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@xxxxxxxxxxxxx> writes:
> > diff --git a/replace_object.c b/replace_object.c
> > new file mode 100644
> > index 0000000..25b3ef3
> > --- /dev/null
> > +++ b/replace_object.c
> > @@ -0,0 +1,116 @@
> > +#include "cache.h"
> > +#include "refs.h"
> > +
> > +static struct replace_object {
> > +	unsigned char sha1[2][20];
> > +} **replace_object;
> > +
> > +static int replace_object_alloc, replace_object_nr;
> > +
> > +static int replace_object_pos(const unsigned char *sha1)
> > +{
> > +	int lo, hi;
> > +	lo = 0;
> > +	hi = replace_object_nr;
> > +	while (lo < hi) {
> > +		int mi = (lo + hi) / 2;
> > +		struct replace_object *rep = replace_object[mi];
> > +		int cmp = hashcmp(sha1, rep->sha1[0]);
> > +		if (!cmp)
> > +			return mi;
> > +		if (cmp < 0)
> > +			hi = mi;
> > +		else
> > +			lo = mi + 1;
> > +	}
> > +	return -lo - 1;
> > +}
>
> Hmm, this is a tangent of this topic, but I wonder if we can do something
> more generic to factor out many binary search like this we have
> throughout the code.  Also I wonder if they can be made more efficient by
> taking advantage of the fact that our hash is expected to produce a good
> uniform distribution, similar to the way patch-ids.c::patch_pos() does
> this.
>
> I guess the performance should not matter much for this table, as we
> expect there won't be massive object replacements.
>
> Also I recall Dscho muttered something about hashmap I didn't quite
> understand.

Yeah, maybe it's possible to get faster code and to refactor the 
many binary search we have, but I will leave that for latter or for other 
people interested in these topics, if you let me.

> > +static int register_replace_ref(const char *refname,
> > +				const unsigned char *sha1,
> > +				int flag, void *cb_data)
> > +{
> > +	/* Get sha1 from refname */
> > +	const char *slash = strrchr(refname, '/');
> > +	const char *hash = slash ? slash + 1 : refname;
> > +	struct replace_object * repl_obj = xmalloc(sizeof(*repl_obj));
>
> 	struct replace_object *repl_obj = ...

Ok.

> > +	if (strlen(hash) != 40 || get_sha1_hex(hash, repl_obj->sha1[0])) {
> > +		free(repl_obj);
> > +		warning("bad replace ref name: %s", refname);
> > +	}
> > +
> > +	/* Copy sha1 from the read ref */
> > +	hashcpy(repl_obj->sha1[1], sha1);
>
> Upon an error, you free and warn and then still copy into it?

Ooops, I forgot a "return 0;" statement after the warning.

> > +	/* Register new object */
> > +	if (register_replace_object(repl_obj, 1))
> > +		warning("duplicate replace ref: %s", refname);
>
> I'd say this is a grave error and should be reported as a repository
> corruption.

If we let people have a set of replace refs in "refs/replace/" and another 
one in "refs/replace/bisect/", and the latter one is used only when 
bisecting, then it may happen that the same commit has one ref 
in "refs/replace/" and another one in "refs/replace/bisect/". In this case 
it should probably be considered as something we should not even warn on, 
and the replace ref in "refs/replace/bisect/" should be used when 
bisecting.

But, as we don't have a mechanism to do that yet, you are right, we should 
probably "die" for now here.

> > +static void prepare_replace_object(void)
> > +{
> > +	static int replace_object_prepared;
> > +
> > +	if (replace_object_prepared)
> > +		return;
> > +
> > +	for_each_replace_ref(register_replace_ref, NULL);
> > +	replace_object_prepared = 1;
> > +}
> > +
> > +/* We allow "recursive" replacement. Only within reason, though */
> > +#define MAXREPLACEDEPTH 5
> > +
> > +const unsigned char *lookup_replace_object(const unsigned char *sha1)
> > +{
> > +	int pos, depth = MAXREPLACEDEPTH;
> > +	const unsigned char *cur = sha1;
> > +
> > +	prepare_replace_object();
> > +
> > +	/* Try to recursively replace the object */
> > +	do {
> > +		if (--depth < 0)
> > +			die("replace depth too high for object %s",
> > +			    sha1_to_hex(sha1));
> > +
> > +		pos = replace_object_pos(cur);
> > +		if (0 <= pos)
> > +			cur = replace_object[pos]->sha1[1];
> > +	} while (0 <= pos);
> > +
> > +	return cur;
> > +}
>
> Since your paradigm is prepare replacement once at the beginning, never
> allowing to update it, I think you can update the table while you look it
> up.  E.g. if A->B and B->C exists, and A is asked for, you find A->B (to
> tentatively make cur to point at B) and then you find B->C, and before
> returning you can rewrite the first mapping to A->C.  Later look-up won't
> need to dereference the table twice that way.
>
> This assumes that there will be small number of replacements, but the
> same object can be asked for more than once during the process.

If we allow different sets of replace refs, for example A->B 
in "refs/replace/" and B->C in "refs/replace/bisect/", then we cannot 
rewrite as you suggest. We could add A->C in "refs/replace/bisect/", so 
that it overcomes A->B and B->C when we bisect, but we would not gain much.

So I prefer not to do that right now. Maybe later if we decide we really 
don't want to allow different sets of replace refs, we can do what you 
suggest. 

> > diff --git a/sha1_file.c b/sha1_file.c
> > index f08493f..4f2fd10 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -2163,10 +2163,18 @@ static void *read_object(const unsigned char
> > *sha1, enum object_type *type, void *read_sha1_file(const unsigned char
> > *sha1, enum object_type *type, unsigned long *size)
> >  {
> > -	void *data = read_object(sha1, type, size);
> > +	const unsigned char *repl = lookup_replace_object(sha1);
> > +	void *data = read_object(repl, type, size);
> > +
> > +	/* die if we replaced an object with one that does not exist */
> > +	if (!data && repl != sha1)
> > +		die("replacement %s not found for %s",
> > +		    sha1_to_hex(repl), sha1_to_hex(sha1));
> > +
> >  	/* legacy behavior is to die on corrupted objects */
> > -	if (!data && (has_loose_object(sha1) || has_packed_and_bad(sha1)))
> > -		die("object %s is corrupted", sha1_to_hex(sha1));
> > +	if (!data && (has_loose_object(repl) || has_packed_and_bad(repl)))
> > +		die("object %s is corrupted", sha1_to_hex(repl));
> > +
> >  	return data;
> >  }
>
> Later we'd need a global switch to forbid the replacement for
> connectivity walkers, 

Yeah, I am slowly working on it. My next series (hopefully in a few days) 
where the above errors are fixed will include it.

> but other than that I think this is sane. 
>
> I also looked at 1/ and 2/ which looked Ok.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html