Re: [PATCH] check_refname_component: Optimize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 30, 2014 at 06:43:18AM +0700, Duy Nguyen wrote:

> >> The first time we read packed_refs, check_refname_format() is called
> >> in read_packed_refs()->create_ref_entry() as usual. If we find no
> >> problem, we store packed_refs stat() info in maybe packed_refs.stat.
> >> Next time we read packed_refs, if packed_refs.stat is there and
> >> indicates that packed_refs has not changed, we can make
> >> create_ref_entry() ignore check_refname_format() completely.
> >
> > I'm confused. Why would we re-open packed-refs at all if the stat
> > information hasn't changed?
> 
> No, not in the same process. In the next run.

Ah, I thought "packed_refs.stat" was a struct member, but I guess you
mean it as a filename.

But then we're just trusting that the "trust me" flag on disk is
correct. Why not just trust that packed-refs is correct in the first
place?

IOW, consider this progression of changes:

  1. Check refname format when we read packed-refs (the current
     behavior).

  2. Keep a separate file "packed-refs.stat" with stat information. If
     the packed-refs file matches that stat information, do not bother
     checking refname formats.

  3. Put a flag in "packed-refs" that says "trust me, I'm valid". Check
     the refnames when it is generated.

  4. Realize that we already check the refnames when we write it out.
     Don't bother writing "trust me, I'm valid"; readers can assume that
     it is.

What is the scenario that option (2) protects against that options (3)
and (4) do not?

I could guess something like "the writer has a different idea of what a
valid refname is than we do". But that applies as well to (2), but just
as "the reader who wrote packed-refs.stat has a different idea than we
do".

As a side note, while it is nice that we might make check_refname_format
faster, I think if you _really_ want to make repos with a lot of refs
faster, it would make more sense to introduce an on-disk format that
does not need linear parsing (e.g., something we could mmap and binary
search, or even something dbm-ish that could be updated without
rewriting the whole file (deletions, for example, must rewrite the
whole file, giving quadratic performance when deleting all refs one by
one).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]