Re: Fw: Curiosity

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 16 Dec 2021 13:42:33 -0800

Martin Fick <mfick@xxxxxxxxxxxxxx> writes:

> On 2021-12-16 14:20, João Victor Bonfim wrote:
>>> To expand on this, if what you're storing is already compressed, like
>>> Ogg Vorbis files or PNGs, like are found in that repository, then
>>> generally they will not delta well. This is also true of things like
>>> Microsoft Office or OpenOffice documents, because they're essentially
>>> Zip files.
>>> The delta algorithm looks for similarities between files to
>>> compress
>>> them. If a file is already compressed using something like Deflate,
>>> used in PNGs and Zip files, then even very similar files will
>>> generally
>>> look very different, so deltification will generally be ineffective.
> ...
>> Maybe I am thinking too outside the box, but wouldn't it be quite more
>> effective for git to identify compressed files, specially on edge cases
>> where the compression doesn't have a good chemistry with delta
>> compression,
>> decompress them for repo storage while also storing the compression
>> algorithm as some metadata tag (like a text string or an ID code
>> decided
>> beforehand), and, when creating the work mirrors, return the
>> compression
>> to its default state before checkout?
>
> I suspect that for most algorithms and their implementations, this would
> not result in repeatable "recompressed" results. Thus the checked-out
> files might be different every time you checked them out. :(

That is probably too application specific to be in core-git, but it
is probably a good application for smudge/clean filters like brian
alluded to?