On Fri, Dec 14, 2018 at 2:48 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Fri, Dec 14 2018, Clement Moyroud wrote: > > > My group at work is migrating a CVS repo to Git. The biggest issue we > > face so far is the performance of git blame, especially compared to > > CVS on the same file. One file especially causes us trouble: it's a > > 30k lines file with 25 years of history in 3k+ commits. The complete > > repo has 200k+ commits over that same period of time. > > There's a real-world repo with a shape & size very similar to this that > has good performance, gcc.git: https://github.com/gcc-mirror/gcc > > $ wc -l ChangeLog > 20240 ChangeLog > $ git log --oneline -- ChangeLog | wc -l > 2676 > $ git log --oneline | wc -l > 165309 > $ time git blame ChangeLog >/dev/null > > real 0m1.977s > user 0m1.909s > sys 0m0.069s > > Its history began in 1997, and the changes to the ChangeLog file by its > nature is fairly evenly spread through that period. > > So check out that repo to see if you have similar or worse > performance. Does your work repo show the same problem with a history > produced with 'git fast-export --anonymize', and if so is that something > you'd be OK with sharing? Hi Ævar, I see around 3s here on the GCC repo, but I'm on a VM and the repo is cloned on an NFS disk, so I'd say it matches :) It's around 45x faster than my repo, on the same NFS share and VM. So there's definitely something to improve here on my end (see my reply to Bryan re: repack in a separate e-mail). The anonymized export won't work in that case: all file contents are replaced with 'anonymous blob <n>', so there's no per-line history for blame to follow. Let me see if I can post-process a non-anonymized version to keep the relevant data available. Cheers, Clément