Martin Fick <mfick@xxxxxxxxxxxxxx> writes: > On 2021-12-16 14:20, João Victor Bonfim wrote: >>> To expand on this, if what you're storing is already compressed, like >>> Ogg Vorbis files or PNGs, like are found in that repository, then >>> generally they will not delta well. This is also true of things like >>> Microsoft Office or OpenOffice documents, because they're essentially >>> Zip files. >>> The delta algorithm looks for similarities between files to >>> compress >>> them. If a file is already compressed using something like Deflate, >>> used in PNGs and Zip files, then even very similar files will >>> generally >>> look very different, so deltification will generally be ineffective. > ... >> Maybe I am thinking too outside the box, but wouldn't it be quite more >> effective for git to identify compressed files, specially on edge cases >> where the compression doesn't have a good chemistry with delta >> compression, >> decompress them for repo storage while also storing the compression >> algorithm as some metadata tag (like a text string or an ID code >> decided >> beforehand), and, when creating the work mirrors, return the >> compression >> to its default state before checkout? > > I suspect that for most algorithms and their implementations, this would > not result in repeatable "recompressed" results. Thus the checked-out > files might be different every time you checked them out. :( That is probably too application specific to be in core-git, but it is probably a good application for smudge/clean filters like brian alluded to?