Re: [PATCH v3] refs: Use binary search to lookup refs faster

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Sun, 02 Oct 2011 07:15:47 +0200

On 09/30/2011 06:38 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> On 09/30/2011 01:48 AM, Junio C Hamano wrote:
>>> This version looks sane, although I have a suspicion that it may have
>>> some interaction with what Michael may be working on.
>>
>> Indeed, I have almost equivalent changes in the giant patch series that
>> I am working on [1].
> 
> Good; that was the primary thing I wanted to know.  I want to take
> Julian's patch early but if the approach and data structures were
> drastically different from what you are cooking, that would force
> unnecessary reroll on your part, which I wanted to avoid.

Um, well, my patch series includes the same changes that Julian's wants
to introduce, but following lots of other changes, cleanups,
documentation improvements, etc.  Moreover, my patch series builds on
mh/iterate-refs, with which Julian's patch conflicts.  In other words,
it would be a real mess to reroll my series on top of Julian's patch.
(That is of course not to imply that I hold a mutex on refs.c.)  Because
it changes a data structure that is used throughout refs.c, changes a
lot of lines of code.

I think that the switch from linked list + linear sort to array plus
binary sort is a pretty obvious win in terms of code complexity and
*potential* performance improvement, but empirically I haven't seen any
claims that it brings performance improvements beyond "René's patch".
(Though, honestly, I've lost track of which "René's patch" is being
discussed and I don't see anything relevant in Junio's tree.)

Intuitively, given that populating the reference cache involves O(N)
I/O, speeding up lookups can only help if there are very many ref
lookups within a single git invocation.  I think we will get a better
improvement by avoiding the reading of unneeded loose refs by reading
them one subdirectory at a time instead of always reading them en masse.
 I wanted to reach that milestone before submitting my changes.

My preference would be:

1. Merge jp/get-ref-dir-unsorted, perhaps even into maint.  It is a
simple, noninvasive, and obvious improvement and helps performance a lot
in an important use case.

2. Hold off on merging jp/get-ref-dir-unsorted for a while to give me a
chance to avoid conflict hell.

3. Evaluate René's patch on its own merits; if it makes sense regardless
of the binary search speedups, then it can be accepted independently to
give most of the performance benefit already.

Are there any other patches in this area that I've forgotten?

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html