Re: Tag peeling peculiarities

Jeff King <peff@xxxxxxxx> · Sat, 16 Mar 2013 05:34:41 -0400

On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:

> My patch series is nearly done.  I will need another day or two to
> review and make it submission-ready, but I wanted to give you an idea of
> what I'm up to and I could also use your feedback on some points.

I was just sending out my cleaned up patches to do do fully-peeled, too.
I think the duplication is OK, though, as your topic would be for
"master" and mine can go to "maint".

> * Implement fully-peeled packed-refs files, as discussed upthread: peel
>   references outside of refs/tags, and keep rigorous track of
>   which references have REF_KNOWS_PEELED.

Looks pretty similar to mine. We even added similar tests.

I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel
line" optimization. I didn't bother, because this will never be
triggered by a git-written file (either we do not write the entry, or we
set fully-peeled). I wonder if any other implementation does peel every
ref, though.

> * Change pack-refs to use the peeled information from ref_entries if it
>   is available, rather than looking up the references again.

I don't know that it matters from a performance standpoint (we don't
really care how long pack-refs takes, as long as it is within reason,
because it is run as part of a gc).  But it would mean that any errors
in the file are propagated from one run of pack-refs to the next. I
don't know if we want to spend the extra time to be more robust.

> * repack_without_ref() writes peeled refs to the packed-refs file.
>   Use the old values if available; otherwise look them up.

Whereas here we probably _do_ want the performance optimization, because
we do care about the performance of a ref deletion.

> 1. There are multiple functions for peeling refs:
>    peel_ref() (and peel_object(), which was extracted from it);
>    peel_entry() (derived from pack-refs.c:pack_one_ref()).  It would be
>    nice to combine these into the One True Function.  But given the
>    problem that you mentioned above (which is rooted in parts of the
>    code that I don't know) I don't know what that version should do.

I'm not sure I understand the question. Just skimming your patches, it
looks like peel_entry could just call peel_object?

> 2. repack_without_ref() now writes peeled refs, peeling them if
>    necessary.  It does this *without* referring to the loose refs
>    to avoid having to load all of the loose references, which can be
>    time-consuming.  But this means that it might try to lookup SHA1s
>    that are not the current value of the corresponding reference any
>    more, and might even have been garbage collected.

Yuck. I really wonder if repack_without_ref should literally just be
writing out the exact same file, minus the lines for the deleted ref.
Is there a reason we need to do any processing at all? I guess the
answer is that you want to avoid re-parsing the current file, and just
write it out from memory. But don't we need to refresh the memory cache
from disk under the lock anyway? That was the pack-refs race that I
fixed recently.

> Is the code that
>    I use to do this, in peel_entry(), safe to call for nonexistent
>    SHA1s (I would like it to return 0 if the SHA1 is invalid)?:
> 
> 	o = parse_object(entry->u.value.sha1);
> 	if (o->type == OBJ_TAG) {
> 		o = deref_tag(o, entry->name, 0);
> 		if (o) {
> 			hashcpy(entry->u.value.peeled, o->sha1);
> 			entry->flag |= REF_KNOWS_PEELED;
> 			return 1;
> 		}
> 	}
> 	return 0;

I think this approach is safe, as parse_object will silently return NULL
for a missing object. You do need to check for "o != NULL" in your
conditional, though.

> 3. This same change to repack_with_ref() means that it could become
>    significantly slower to delete a packed ref if the packed-ref file
>    is not fully-peeled.  On the plus side, once this is done once, the
>    packed-refs files will be rewritten in fully-peeled form.  But if
>    two versions of Git are being used in a repository, this cost could
>    be incurred repeatedly.  Does anybody object?

I think it's OK in concept. But I still am wondering if it would be
simpler still to just pass the file through while locked.

> 4. Should I change the locking policy as discussed in this thread?:
> 
>        http://article.gmane.org/gmane.comp.version-control.git/212131

I think it's overall a sane direction. It does increase lock contention
slightly when two processes are deleting at the same time, but it would
fix the weird corner cases I described (mostly deleted refs reappearing
due to races). And the lock contention is already there in a
fully-packed repo anyway. I.e., right now we read the packed-refs file
and lock it if our to-delete ref is in there; with the proposed change,
we would lock before even reading it. So the increased contention is
only when two deleters race each other, _and_ one of them is not
deleting a packed ref.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html