Re: [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote:
> +if test -n "$state_branch"
> +then
> > +	echo "Saving rewrite state to $state_branch" 1>&2
> > +	state_blob=$(
> > +		perl -e'opendir D, "../map" or die;
> > +			open H, "|-", "git hash-object -w --stdin" or die;
> > +			foreach (sort readdir(D)) {
> > +				next if m/^\.\.?$/;
> > +				open F, "<../map/$_" or die;
> > +				chomp($f = <F>);
> > +				print H "$_:$f\n" or die;
> > +			}
> > +			close(H) or die;' || die "Unable to save state")

One things I've noticed is that for a full Linux tree history the
filter.map file is 50M+ which causes github to complain:

    remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB

(you can simulate this with `git log --pretty=format:"%H:%H"
upstream/master`.) I suppose that's not a bad recommendation for any
infra, not just GH's.

The blob is compressed in the object store so there isn't _much_ point
in compressing the map (also, it only goes down to ~30MB anyway so we
aren't buying all that much time), but I'm wondering if perhaps I
should look into a more intelligent representation, perhaps hashed by
the first two characters (as .git/objects is) to divide into several
blobs and have two levels.

I'm also wondering if the .git-rewrite/map directory, which will have
70k+ (and growing) directory entries for a modern Linux tree, would
benefit from the same sort of thing. OTOH in this case the extra shell
machinations to turn abcdef123 into ab/cdef123 might overwhelm the
savings in directory lookup time (unless there is a helper already for
that. That assume that directory lookup is even a bottleneck, I've not
measured but anecdotally/gut-feeling the commits-per-second does seem
to be decreasing over the course of the filtering process.

Ian.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux