On Sat, Jun 30, 2018 at 12:10:24AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Jun 29 2018, Mike Hommey wrote: > > > I noticed some slowness when fast-importing data from the Firefox mercurial > > repository, where fast-import spends more than 5 minutes importing ~2000 > > revisions of one particular file. I reduced a testcase while still > > using real data. One could synthesize data with kind of the same > > properties, but I figured real data could be useful. > > > > To reproduce: > > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo > > $ git init bar > > $ cd bar > > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000 > > > > [...] > > So maybe it would make sense to consolidate the diff code (after all, > > diff-delta.c is an old specialized fork of xdiff). With manual trimming > > of common head and tail, this gets down to 3:33. > > > > I'll also note that Facebook has imported xdiff from the git code base > > into mercurial and improved performance on it, so it might also be worth > > looking at what's worth taking from there. > > It would be interesting to see how does this compares with a more naïve > approach of committing every version of this file one-at-a-time into a > new repository (with & without gc.auto=0). Perhaps deltaing as we go is > suboptimal compared to just writing out a lot of redundant data and > repacking it all at once later. "Just" writing 26GB? And that's only one file. If I were to do that for the whole repository, it would yield a > 100GB pack. Instead of < 2GB currently. Mike