Re: [PATCH] check_refname_component: Optimize

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Fri, 30 May 2014 11:47:33 +0200

On 05/30/2014 02:07 AM, Jeff King wrote:
> On Fri, May 30, 2014 at 06:43:18AM +0700, Duy Nguyen wrote:
> 
>>>> The first time we read packed_refs, check_refname_format() is called
>>>> in read_packed_refs()->create_ref_entry() as usual. If we find no
>>>> problem, we store packed_refs stat() info in maybe packed_refs.stat.
>>>> Next time we read packed_refs, if packed_refs.stat is there and
>>>> indicates that packed_refs has not changed, we can make
>>>> create_ref_entry() ignore check_refname_format() completely.
>>>
>>> I'm confused. Why would we re-open packed-refs at all if the stat
>>> information hasn't changed?
>>
>> No, not in the same process. In the next run.
> 
> Ah, I thought "packed_refs.stat" was a struct member, but I guess you
> mean it as a filename.
> 
> But then we're just trusting that the "trust me" flag on disk is
> correct. Why not just trust that packed-refs is correct in the first
> place?
> 
> IOW, consider this progression of changes:
> 
>   1. Check refname format when we read packed-refs (the current
>      behavior).
> 
>   2. Keep a separate file "packed-refs.stat" with stat information. If
>      the packed-refs file matches that stat information, do not bother
>      checking refname formats.
> 
>   3. Put a flag in "packed-refs" that says "trust me, I'm valid". Check
>      the refnames when it is generated.
> 
>   4. Realize that we already check the refnames when we write it out.
>      Don't bother writing "trust me, I'm valid"; readers can assume that
>      it is.
> 
> What is the scenario that option (2) protects against that options (3)
> and (4) do not?
> 
> I could guess something like "the writer has a different idea of what a
> valid refname is than we do". But that applies as well to (2), but just
> as "the reader who wrote packed-refs.stat has a different idea than we
> do".

If we want to be robust to future changes to refname rules, we could add
a header flag like

    # pack-refs with: peeled fully-peeled check-level=1.0

which promises that the reference names in the file conform to the
current ("version 1.0") check_refname_format() rules.

If we ever make the rules stricter (a "backwards-compatible" change), we
would increment the check-level to 1.1.  That way, an old reader, who
knows about check-level=1.0 but not check-level=1.1, can still trust
that the refnames in the file conform to its check_refname_format()
rules and avoid the verification step.  (Of course if that version
writes the file again, it would have to set the check-level=1.0 tag, and
newer software would be forced to validate on reading to be sure that
the refnames still conform to check-level=1.1.)

If we make the rules looser (a "backwards-incompatible" change), we
would increment the check-level to 2.0.  In that case readers who only
know about check-level 1.x would have to turn their verification code
back on when reading the file to ensure that they can work with the
refnames that it contains.

Format changes should be infrequent enough, and the cost of verification
is low enough, that sometimes ping-ponging back and forth between
software versions shouldn't be a problem in practice.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html