On Fri, Jun 29, 2018 at 3:18 AM Mike Hommey <mh@xxxxxxxxxxxx> wrote: > > Hi, > > I noticed some slowness when fast-importing data from the Firefox mercurial > repository, where fast-import spends more than 5 minutes importing ~2000 > revisions of one particular file. I reduced a testcase while still > using real data. One could synthesize data with kind of the same > properties, but I figured real data could be useful. I cc'd Jameson, who refactored memory allocation in fast-import recently. (I am not aware of other refactorings in the area of fast-import) > To reproduce: [...] > Memory total: 2282 KiB > pools: 2048 KiB > objects: 234 KiB > [...] > Obviously, sha1'ing 26GB is not going to be free, but it's also not the > dominating cost, according to perf: > > 63.52% git-fast-import git-fast-import [.] create_delta_index So this doesn't sound like a memory issue, but a diffing/deltaing issue. > So maybe it would make sense to consolidate the diff code (after all, > diff-delta.c is an old specialized fork of xdiff). With manual trimming > of common head and tail, this gets down to 3:33. This sounds interesting. I'd love to see that code to be unified. > I'll also note that Facebook has imported xdiff from the git code base > into mercurial and improved performance on it, so it might also be worth > looking at what's worth taking from there. So starting with https://www.mercurial-scm.org/repo/hg/rev/34e2ff1f9cd8 ("xdiff: vendor xdiff library from git") they adapted it slightly: $ hg log --template '{node|short} {desc|firstline}\n' -- mercurial/thirdparty/xdiff/ a2baa61bbb14 xdiff: move stdint.h to xdiff.h d40b9e29c114 xdiff: fix a hard crash on Windows 651c80720eed xdiff: silence a 32-bit shift warning on Windows d255744de97a xdiff: backport int64_t and uint64_t types to Windows e5b14f5b8b94 xdiff: resolve signed unsigned comparison warning f1ef0e53e628 xdiff: use int64 for hash table size f0d9811dda8e xdiff: remove unused xpp and xecfg parameters 49fe6249937a xdiff: remove unused flags parameter 882657a9f768 xdiff: replace {unsigned ,}long with {u,}int64_t 0c7350656f93 xdiff: add comments for fields in xdfile_t f33a87cf60cc xdiff: add a preprocessing step that trims files 3cf40112efb7 xdiff: remove xmerge related logic 90f8fe72446c xdiff: remove xemit related logic b5bb0f99064d xdiff: remove unused structure, functions, and constants 09f320067591 xdiff: remove whitespace related feature 1f9bbd1d6b8a xdiff: fix builds on Windows c420792217c8 xdiff: reduce indent heuristic overhead b3c9c483cac9 xdiff: add a bdiff hunk mode 9e7b14caf67f xdiff: remove patience and histogram diff algorithms 34e2ff1f9cd8 xdiff: vendor xdiff library from git Interesting pieces regarding performance: c420792217c8 xdiff: reduce indent heuristic overhead https://phab.mercurial-scm.org/rHGc420792217c89622482005c99e959b9071c109c5 f33a87cf60cc xdiff: add a preprocessing step that trims files https://phab.mercurial-scm.org/rHGf33a87cf60ccb8b46e06b85e60bc5031420707d6 I'll see if I can make that into patches. Thanks, Stefan