Re: Tag peeling peculiarities

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 15 Mar 2013 06:12:10 +0100

On 03/14/2013 02:40 PM, Jeff King wrote:
> On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote:
> 
>> Perhaps if peel_ref() *were* 100% reliable, we might be able to use it
>> to avoid object lookups in some other places.
> 
> In theory, some of the many uses of deref_tag could be adopted. However,
> we do not always have the refname handy at that point (and I believe
> peel_ref's optimization only kicks in during for_each_ref traversals
> anyway).
> 
> It may still be a win to check the packed-refs file before peeling a
> random sha1, as looking up there should be cheaper than actually loading
> the object. But right now, the way the optimization is used is always
> O(1) to just check the last ref loaded. With your recent ref
> refactoring, I think we should be able to do lookups in O(lg n).

There are more conceptual problems in the handling of peeled references
and current_ref.  I want to document them here for future reference and
to get other people thinking about whether other parts of the code might
be making wrong assumptions about this stuff.

What is stored in ref_value.peeled?  Is it the peeled version of
ref_value.sha1, or is it the peeled version of the associated refname?
Because they are not necessarily the same thing: an entry in the packed
ref_cache *might* be overridden by a loose reference with the same
refname but a different SHA1 which peels to a different value.

Obviously we cannot always guarantee that ref_value.peeled is the peeled
version of the refname, because it is read from a packed-refs file that
might be old.  So the only sane policy is that ref_value.peeled should
be the peeled version of ref_value.sha1 regardless of whether the
refname now points elsewhere.

It follows that the only time we can be sure that ref_value.peeled is
the correct peeled version of refname is if we know that there is no
loose reference overriding it.

This leads to some subtleties when reading and writing ref_value.peeled.

When reading ref_value.peeled:

When we iterate over references using do_for_each_ref(), then we only
present a packed ref if there is no loose ref with the same refname.  So
in that case it is OK to point current_ref at the packed ref entry, and
if somebody calls peel_ref() for the current reference then (modulo race
conditions?) current_ref->u.value.peeled is indeed the correct peeled
value for that refname.

But if we iterate over *only* the packed refs using
do_for_each_ref_in_dir() (as, for example, in repack_without_ref()),
then the iteration doesn't even look at the loose refs, so we don't know
whether the packed ref_entry being iterated over is authoritative.  But
we nevertheless set current_ref to that entry.  So if somebody would
call peel_ref() on the current refname during such an iteration, they
could get the peeled version of the packed ref rather than the correct
peeled version for the refname.

Currently, nobody calls peel_ref() during such an iteration, so I don't
think it causes any bugs.  But it is an unmarked landmine.

When setting ref_value.peeled:

When determining the value to write to ref_value.peeled (which I believe
only happens in pack-refs), we should record the peeled version of
ref_entry.sha1, and *not* the peeled version of refname.  This is indeed
what pack-refs does.  But if somebody were to get the clever and
reasonable-sounding idea of using peel_ref() in the implementation of
pack-refs.c:handle_one_ref(), then it would be easy to accidentally get
a peeled value for the current refname rather than for the SHA1 being
recorded in the file.

I am trying to change repack_without_refs() to write the peeled refs to
the packed-refs file (currently the peeled refs are lost if a packed ref
is deleted).  For this I would like to reuse the pre-peeled values from
the old version of the packed-refs file to avoid having to resolve them
all again.  So all of the issues mentioned above are coming together for me.

I think my solution will be to add a static function in refs.c that
passes pointers to the ref_entries to its callback, so that the function
has direct access to the peeled member rather than having to access it
via the information-losing peel_ref().

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html