On Fri, Oct 05, 2018 at 06:44:25PM +0200, Ævar Arnfjörð Bjarmason wrote: > Some version of the former. Ones where we haven't found any (or much of) > useful deltas yet. E.g. say I had a repository with a lot of files > generated by this command at various points in the history: > > dd if=/dev/urandom of=file.binary count=1024 bs=1024 > > Some script similar to git-sizer which could report that the > packed+compressed+delta'd version of the 10 *.binary files I had in my > history had a 1:1 ratio of how large they were in .git, v.s. how large > the sum of each file retrieved by "git show" was (i.e. uncompressed, > un-delta'd). You can get the uncompressed and on-disk sizes with: git cat-file --batch-all-objects \ --batch-check='%(objectname) %(objectsize) %(objectsize:disk)' and then compare the sizes/ratios however you like. If you want just a subset of the blobs, drop the "--batch-all-objects" and just feed the object names or even "HEAD:filename" on stdin). > That doesn't mean that tomorrow I won't commit 10 new objects which > would have a really good delta ratio to those 10 existing files, > bringing the ratio to ~1:2, but if I had some report like: > > <ratio> <extension> > > For a given repo that could be fed into .gitattributes to say we > shouldn't bother to delta files of certain extensions. I don't know of a tool that does that, but I think a modest application of perl to the cat-file output would produce it. -Peff