Re: Commit cache to speed up rev-list and merge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 30, 2012 at 7:05 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> On Mon, Oct 1, 2012 at 8:49 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>> On Thu, Sep 27, 2012 at 7:14 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
>>> On Thu, Sep 27, 2012 at 10:51 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>>>> In Linus' Linux kernel tree there are currently about 323,178 commits.
>>>> If we store just the pre-parsed commit time as an int32 field this is
>>>> an additional 1.2 MiB of data in the pack-*.idx file, assuming we can
>>>> use additional data like pack offset position to correlate commit to
>>>> the parsed int. If we stored parent pointers in a similar way you
>>>> probably need at least 3.6 MiB of additional disk space on the index.
>>>> For example, use 12 bytes for each commit to store enough of the
>>>> parsed commit time to sort commits, and up to 2 parent pointers per
>>>> commit.... with a reserved magic value for octopus merges to mean the
>>>> commit itself has to be parsed to get the graph structure correct.
>>>
>>> This is much better than my naive approach (storing sha-1 and
>>> timestamps). We could use less space by storing parent pointer of
>>> non-merge commits only. Merge commits linux-2.6 is 6% the number of
>>> commits. git.git has higher percentage, 21%. I bet many projects do
>>> not merge as much and the number of merge commits is less than 5%.
>>
>> Some projects merge quite often. Android's frameworks/base repository
>> has a very large number of merges. Out of 79905 commits reachable from
>> the master branch, 65.3% are merges. So actually there are more merge
>> commits in the Android history than there are code commits. A cache of
>> only non-merges may be worthless on such a history.
>
> The good thing about these cache is it's configurable. Merge-preferred
> projects can choose to cache the first two parents. Non-merge projects
> can choose to cache just the first parent. We don't need a fixed
> format for both.

Git has enough magic switches. It doesn't need yet another magic
switch that one group of users needs to set, and another can safely
ignore because their project's usage just happens to align with Linus
Torvald's current world view.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]