On Tue, Apr 4, 2017 at 12:39 AM, Eric Wong <e@xxxxxxxxx> wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: >> On Mon, Apr 3, 2017 at 11:34 PM, Eric Wong <e@xxxxxxxxx> wrote: >> > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: >> >> - Should we be covering good practices for your repo going forward to >> >> maintain good performance? E.g. don't have some huge tree all in >> >> one directory (use subdirs), don't add binary (rather >> >> un-delta-able) content if you can help it etc. >> > >> > Yes, I think so. >> >> I'll try to write something up. >> >> > I think avoiding ever growing ChangeLog-type files should also >> > be added to things to avoid. >> >> How were those bad specifically? They should delta quite well, it's >> expensive to commit large files but no more because they're >> ever-growing. > > It might be blame/annotate specifically, I was remembering this > thread from a decade ago: > > https://public-inbox.org/git/4aca3dc20712110933i636342fbifb15171d3e3cafb3@xxxxxxxxxxxxxx/T/ I did some basic testing on this, and I think advice about ChangeLog-style files isn't worth including. On gcc.git blame on ChangeLog still takes a few hundred MB of RAM, but finishes in about 2s on my machine. That gcc/fold-const.c file takes ~10s for me though, but that thread seems to have resulted in some patches to git-blame. Running this: parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr "\n" "\t" && echo {}' ::: $(git ls-files) | tee /tmp/git-blame-times.txt On git.git shows that the slowest blames are just files with either lots of commits, or lots of lines, or some combination of the two. The gcc.git repo has some more pathological cases, top 10 on that repo: $ parallel '/usr/bin/time -f %E git blame {} 2>&1 >/dev/null | tr "\n" "\t" && git log --oneline {} | wc -l | tr "\n" "\t" && wc -l {} | tr "\n" "\t" && echo {}' ::: $(git ls-files|grep -e ^gcc/ -e ChangeLog|grep -v '/.*/') | tee /tmp/gcc-blame-times.txt $ sort -nr /tmp/gcc-blame-times.txt |head -n 10 0:18.12 1513 14517 gcc/tree.c gcc/tree.c 0:17.35 66336 7435 gcc/ChangeLog gcc/ChangeLog 0:16.87 1634 30455 gcc/dwarf2out.c gcc/dwarf2out.c 0:16.76 1160 7937 gcc/varasm.c gcc/varasm.c 0:16.36 1692 5491 gcc/tree.h gcc/tree.h 0:15.34 94 493 gcc/xcoffout.c gcc/xcoffout.c 0:15.22 54 194 gcc/xcoffout.h gcc/xcoffout.h 0:15.12 964 9224 gcc/reload1.c gcc/reload1.c 0:14.90 1593 2202 gcc/toplev.c gcc/toplev.c 0:14.66 11 43 gcc/typeclass.h gcc/typeclass.h Which makes it pretty clear that blame is slow where you'd expect, not with files that are prepended or appended to. >> One issue with e.g. storing logs (I keep my IRC logs in git) is that >> if you're constantly committing large (text) files without repack your >> .git grows by a *lot* in a very short amount of time until a very >> expensive repack, so now I split my IRC logs by month. > > Yep, that too; as auto GC is triggered by the number of loose > objects, not the size/packability of them.