Re: Git is not scalable with too many refs/*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/01/2011 10:41 PM, Junio C Hamano wrote:
> Martin Fick <mfick@xxxxxxxxxxxxxx> writes:
>> I guess this makes sense, we invalidate the cache and have 
>> to rebuild it after every new ref is added?  Perhaps a 
>> simple fix would be to move the invalidation right after all 
>> the refs are updated?  Maybe write_ref_sha1 could take in a 
>> flag to tell it to not invalidate the cache so that during 
>> iterative updates it could be disabled and then run manually 
>> after the update?
> 
> It might make sense, on top of Julian's patch, to add a bit that says "the
> contents of this ref-array is current but the array is not sorted", and
> whenever somebody runs add_ref(), append it also to the ref-array (so that
> the contents do not have to be re-read from the filesystem) but flip the
> "unsorted" bit on. Then update look-up and iteration to sort the array
> when "unsorted" bit is on without re-reading the contents from the
> filesystem.

My WIP patch series does one better than this; it keeps track of what
part of the array is already sorted so that a reference can be found in
the sorted part of the array using binary search, and if it is not found
there a linear search is done through the unsorted part of the array.  I
also have some code (not pushed) that adds some intelligence to make the
use case

    repeat many times:
        check if reference exists
        add reference

efficient by picking optimal intervals to re-sort the array.  (This sort
can also be faster if most of the array is already sorted: sort the new
entries using qsort then merge sort them into the already-sorted part of
the list.)

But there is another reason that we cannot currently update the
reference cache on the fly rather than invalidating it after each
change: symbolic references are stored *resolved* in the reference
cache, and no record is kept of the reference that they refer to.
Therefore it is possible that the addition or modification of an
arbitrary reference can affect how a symbolic reference is resolved, but
there is not enough information in the cache to track this.

IMO the correct solution is to store symbolic references un-resolved.
Given that lookup is going to become much faster, the slowdown in
reference resolution should not be a big performance penalty, whereas
reference updating could become *much* faster.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]