Re: fast-import slowness when importing large files with small differences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 29 2018, Mike Hommey wrote:

> I noticed some slowness when fast-importing data from the Firefox mercurial
> repository, where fast-import spends more than 5 minutes importing ~2000
> revisions of one particular file. I reduced a testcase while still
> using real data. One could synthesize data with kind of the same
> properties, but I figured real data could be useful.
>
> To reproduce:
> $ git clone https://gist.github.com/b6b8edcff2005cc482cf84972adfbba9.git foo
> $ git init bar
> $ cd bar
> $ python ../foo/import.py ../foo/data.gz | git fast-import --depth=2000
>
> [...]
> So maybe it would make sense to consolidate the diff code (after all,
> diff-delta.c is an old specialized fork of xdiff). With manual trimming
> of common head and tail, this gets down to 3:33.
>
> I'll also note that Facebook has imported xdiff from the git code base
> into mercurial and improved performance on it, so it might also be worth
> looking at what's worth taking from there.

It would be interesting to see how does this compares with a more naïve
approach of committing every version of this file one-at-a-time into a
new repository (with & without gc.auto=0). Perhaps deltaing as we go is
suboptimal compared to just writing out a lot of redundant data and
repacking it all at once later.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux