Re: [PATCH 14/33] refs: extract a function peel_entry()

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Wed, 17 Apr 2013 00:01:57 +0200

On 04/16/2013 07:55 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> On 04/15/2013 07:38 PM, Junio C Hamano wrote:
>>> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
>>>
>>>>  	if (read_ref_full(refname, base, 1, &flag))
>>>>  		return -1;
>>>>  
>>>> -	if ((flag & REF_ISPACKED)) {
>>>> +	/*
>>>> +	 * If the reference is packed, read its ref_entry from the
>>>> +	 * cache in the hope that we already know its peeled value.
>>>> +	 * We only try this optimization on packed references because
>>>> +	 * (a) forcing the filling of the loose reference cache could
>>>> +	 * be expensive and (b) loose references anyway usually do not
>>>> +	 * have REF_KNOWS_PEELED.
>>>> +	 */
>>>> +	if (flag & REF_ISPACKED) {
>>>>  		struct ref_entry *r = get_packed_ref(refname);
>>>
>>> This code makes the reader wonder what happens when a new loose ref
>>> masks a stale packed ref, but the worry is unfounded because the
>>> read_ref_full() wouldn't have gave us REF_ISPACKED in the flag in
>>> such a case.
>>>
>>> But somehow the calling sequence looks like such a mistake waiting
>>> to happen.  It would be much more clear if a function that returns a
>>> "struct ref_entry *" is used instead of read_ref_full() above, and
>>> we checked (r->flag & REF_ISPACKED) in the conditional, without a
>>> separate get_packed_ref(refname).
>>
>> As I'm sure you realize, I didn't change the code that you are referring
>> to; I just added a comment.
> 
> Yes.
> 
>> But yes, I sympathize with your complaint.  Additionally, the code has
>> the drawback that get_packed_ref() is called twice: once in
>> read_ref_full() and again in the if block here.  Unfortunately, this
>> isn't so easy to fix because read_ref_full() doesn't use the loose
>> reference cache, so the reference that it returns might not even have a
>> ref_entry associated with it (specifically, unless the returned flag
>> value has REF_ISPACKED set).  So there are a couple options:
>>
>> * Always read loose references through the cache; that way there would
>> always be a ref_entry in which the return value could be presented.
>> This would not be a good idea at the moment because the loose reference
>> cache is populated one directory at a time, and reading a whole
>> directory of loose references could be expensive.  So before
>> implementing this, it would be advisable to change the code to populate
>> the loose reference cache more selectively when single loose references
>> are needed.  -> This approach would be well beyond the scope of this
>> patch series.
>>
>> * Implement a function like read_ref_full() with an additional (struct
>> ref_entry **entry) argument that is written to *in the case* that the
>> reference that was returned has a ref_entry associated with it, and NULL
>> otherwise.  This would have to be an internal function because we don't
>> want to expose the ref_entry structure outside of refs.c.
>> read_ref_full() would be implemented on top of the new function.
> 
> Isn't there another?
> 
> Instead of using read_ref_full() at this callsite, can it call a
> function, given a refname, returns a pointer to "struct ref_entry",
> using the proper "a loose ref covers the packed ref with the same
> name" semantics?
> 
> I guess that may need the same machinery for your first option to
> implement it efficiently; right now read_ref_full() goes directly to
> the single file that would hold the named ref without scanning and
> populating sibling refs in the in-core loose ref hierarchy, and
> without doing so you cannot return a struct ref_entry.  Hmm...

Yes, that's the crux.  Without the ref_cache, there is no persistent
ref_entry that can be returned to the caller.

Somewhere I have a prototype patch series (never submitted) that would
make it reasonable to access all references via the cache.  For single
reference accesses, it wouldn't read all of the loose references in the
whole directory but rather only create stub REF_INCOMPLETE ref_entries
for the refs that are not needed immediately.  These entries would be
filled when they are accessed.  There was also going to be a bulk read
command to be used when it is expected that the whole directory will be
needed.  With this change we really would have a ref_entry for every
reference and your suggestion would become plausible.  Someday...

But for the current issue it would be totally sufficient to return an
additional pointer to the ref_entry *if it is available*, because for
packed refs it will always be available.

Actually, for the current patch series I don't see the urgency of
changing *anything*.  The code works as is and is no worse than the old
code.  I'd rather do this change (if I get to it) in a separate series.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html