Re: [PATCH 04/19] index-helper: new daemon for caching index and related stuff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 10, 2016 at 6:09 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> David Turner <dturner@xxxxxxxxxxxxxxxx> writes:
>
>> From: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
>>
>> Instead of reading the index from disk and worrying about disk
>> corruption, the index is cached in memory (memory bit-flips happen
>> too, but hopefully less often). The result is faster read. Read time
>> is reduced by 70%.
>>
>> The biggest gain is not having to verify the trailing SHA-1, which
>> takes lots of time especially on large index files. But this also
>> opens doors for further optimiztions:
>>
>>  - we could create an in-memory format that's essentially the memory
>>    dump of the index to eliminate most of parsing/allocation
>>    overhead. The mmap'd memory can be used straight away. Experiment
>>    [1] shows we could reduce read time by 88%.
>>
>>  - we could cache non-index info such as name hash
>>
>> The shared memory's name folows the template "git-<object>-<SHA1>"
>> where <SHA1> is the trailing SHA-1 of the index file. <object> is
>> "index" for cached index files (and may be "name-hash" for name-hash
>> cache). If such shared memory exists, it contains the same index
>> content as on disk. The content is already validated by the daemon and
>> git won't validate it again (except comparing the trailing SHA-1s).
>
> This indeed is an interesting approach; what is not explained but
> must be is when the on-disk index is updated to reflect the reality
> (if I am reading the explanation and the code right, while the
> daemon is running, its in-core cache becomes the source of truth by
> forcing everybody's read-index-from() to go to the daemon).  The
> explanation could be "this is only for read side, and updating the
> index happens via the traditional 'write a new file and rename it to
> the final place' codepath, at which time the daemon must be told to
> re-read it."

Another aspect that's not mentioned is, we keep this daemon's logic as
thin as possible. The "brain" stays in git. So the daemon can read and
validate stuff, but that's about all it's allowed to do. It's not
supposed to add/create new contents. It's not even allowed to accept
direct updates from git.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]