On Fri, Jun 29 2018, Mike Hommey wrote: > I noticed some slowness when fast-importing data from the Firefox mercurial > repository, where fast-import spends more than 5 minutes importing ~2000 > revisions of one particular file. I reduced a testcase while still > using real data. One could synthesize data with kind of the same > properties, but I figured real data could be useful. > > To reproduce: > $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo > $ git init bar > $ cd bar > $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000 > > [...] > So maybe it would make sense to consolidate the diff code (after all, > diff-delta.c is an old specialized fork of xdiff). With manual trimming > of common head and tail, this gets down to 3:33. > > I'll also note that Facebook has imported xdiff from the git code base > into mercurial and improved performance on it, so it might also be worth > looking at what's worth taking from there. It would be interesting to see how does this compares with a more naïve approach of committing every version of this file one-at-a-time into a new repository (with & without gc.auto=0). Perhaps deltaing as we go is suboptimal compared to just writing out a lot of redundant data and repacking it all at once later.