Re: [PATCH/RFC] gitperformance: add new documentation about git performance tuning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 4, 2017 at 12:39 AM, Eric Wong <e@xxxxxxxxx> wrote:
> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong <e@xxxxxxxxx> wrote:
>> > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>> >>  - Should we be covering good practices for your repo going forward to
>> >>    maintain good performance? E.g. don't have some huge tree all in
>> >>    one directory (use subdirs), don't add binary (rather
>> >>    un-delta-able) content if you can help it etc.
>> >
>> > Yes, I think so.
>>
>> I'll try to write something up.
>>
>> > I think avoiding ever growing ChangeLog-type files should also
>> > be added to things to avoid.
>>
>> How were those bad specifically? They should delta quite well, it's
>> expensive to commit large files but no more because they're
>> ever-growing.
>
> It might be blame/annotate specifically, I was remembering this
> thread from a decade ago:
>
>   https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3cafb3@xxxxxxxxxxxxxx/T/

I did some basic testing on this, and I think advice about
ChangeLog-style files isn't worth including. On gcc.git blame on
ChangeLog still takes a few hundred MB of RAM, but finishes in about
2s on my machine. That gcc/fold-const.c file takes ~10s for me though,
but that thread seems to have resulted in some patches to git-blame.

Running this:

    parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr
"\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} |
tr "\n" "\t" && echo {}' ::: $(git ls-files) | tee
/tmp/git-blame-times.txt

On git.git shows that the slowest blames are just files with either
lots of commits, or lots of lines, or some combination of the two. The
gcc.git repo has some more pathological cases, top 10 on that repo:

$ parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n"
"\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr
"\n" "\t" && echo {}' ::: $(git ls-files|grep -e ^gcc/ -e
ChangeLog|grep -v '/.*/') | tee /tmp/gcc-blame-times.txt
$ sort -nr /tmp/gcc-blame-times.txt |head -n 10
0:18.12 1513    14517 gcc/tree.c        gcc/tree.c
0:17.35 66336   7435 gcc/ChangeLog      gcc/ChangeLog
0:16.87 1634    30455 gcc/dwarf2out.c   gcc/dwarf2out.c
0:16.76 1160    7937 gcc/varasm.c       gcc/varasm.c
0:16.36 1692    5491 gcc/tree.h gcc/tree.h
0:15.34 94      493 gcc/xcoffout.c      gcc/xcoffout.c
0:15.22 54      194 gcc/xcoffout.h      gcc/xcoffout.h
0:15.12 964     9224 gcc/reload1.c      gcc/reload1.c
0:14.90 1593    2202 gcc/toplev.c       gcc/toplev.c
0:14.66 11      43 gcc/typeclass.h      gcc/typeclass.h

Which makes it pretty clear that blame is slow where you'd expect, not
with files that are prepended or appended to.


>> One issue with e.g. storing logs (I keep my IRC logs in git) is that
>> if you're constantly committing large (text) files without repack your
>> .git grows by a *lot* in a very short amount of time until a very
>> expensive repack, so now I split my IRC logs by month.
>
> Yep, that too; as auto GC is triggered by the number of loose
> objects, not the size/packability of them.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]