On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote: > +if test -n "$state_branch" > +then > > + echo "Saving rewrite state to $state_branch" 1>&2 > > + state_blob=$( > > + perl -e'opendir D, "../map" or die; > > + open H, "|-", "git hash-object -w --stdin" or die; > > + foreach (sort readdir(D)) { > > + next if m/^\.\.?$/; > > + open F, "<../map/$_" or die; > > + chomp($f = <F>); > > + print H "$_:$f\n" or die; > > + } > > + close(H) or die;' || die "Unable to save state") One things I've noticed is that for a full Linux tree history the filter.map file is 50M+ which causes github to complain: remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB (you can simulate this with `git log --pretty=format:"%H:%H" upstream/master`.) I suppose that's not a bad recommendation for any infra, not just GH's. The blob is compressed in the object store so there isn't _much_ point in compressing the map (also, it only goes down to ~30MB anyway so we aren't buying all that much time), but I'm wondering if perhaps I should look into a more intelligent representation, perhaps hashed by the first two characters (as .git/objects is) to divide into several blobs and have two levels. I'm also wondering if the .git-rewrite/map directory, which will have 70k+ (and growing) directory entries for a modern Linux tree, would benefit from the same sort of thing. OTOH in this case the extra shell machinations to turn abcdef123 into ab/cdef123 might overwhelm the savings in directory lookup time (unless there is a helper already for that. That assume that directory lookup is even a bottleneck, I've not measured but anecdotally/gut-feeling the commits-per-second does seem to be decreasing over the course of the filtering process. Ian.