Re: Fw: Curiosity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> To expand on this, if what you're storing is already compressed, like
> Ogg Vorbis files or PNGs, like are found in that repository, then
> generally they will not delta well. This is also true of things like
> Microsoft Office or OpenOffice documents, because they're essentially
> Zip files.
>
> The delta algorithm looks for similarities between files to compress
> them. If a file is already compressed using something like Deflate,
> used in PNGs and Zip files, then even very similar files will generally
> look very different, so deltification will generally be ineffective.

This explain why, also, Git opens a new mode every time an edit is made,
since it cannot recognize any similarities between the files, even
though there are.

> There are two main solutions to this. One is to store your data
> uncompressed in the repository and compress it as part of a build step.
> This makes your checkouts larger, but it makes your repository smaller.
>
> The other is to store them outside of the repository proper. Some folks
> use Git LFS for this, but you could also just store a manifest with file
> names and secure hashes, plus a download location for a public server.

Maybe I am thinking too outside the box, but wouldn't it be quite more
effective for git to identify compressed files, specially on edge cases
where the compression doesn't have a good chemistry with delta compression,
decompress them for repo storage while also storing the compression
algorithm as some metadata tag (like a text string or an ID code decided
beforehand), and, when creating the work mirrors, return the compression
to its default state before checkout?

Of course you would also need reversing functions when you want to
checkout the info back to repo.

Just throwing ideas out there.

-------------------------------

João Victor Bonfim, any pronouns are welcome.

‐‐‐‐‐‐‐Original Message ‐‐‐‐‐‐‐

Em quarta-feira, 15 de dezembro de 2021 às 23:19, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> escreveu:

> On 2021-12-15 at 18:07:20, Junio C Hamano wrote:
>
> > João Victor Bonfim JoaoVictorBonfim@xxxxxxxxxxxxxx writes:
> >
> > > Since Git is almost used for everything at this point, is there
> > >
> > > any intent on providing better support for non textual file types?
> > >
> > > Why do I say this? Take this game mod which I follow as example ->
> > >
> > > https://github.com/SolariusScorch/XComFiles <- whenever I clone it
> > >
> > > Git takes a significant forever amount of time to download 452 MB
> > >
> > > of files whose some part, from my perspective, isn't being delta
> > >
> > > compressed like the text files are (since, whenever reading a log
> > >
> > > of what changes were made, git creates and undoes modes for all
> > >
> > > binary files, some of which only changed by a pixel from one
> > >
> > > colour to another).
> >
> > Our delta compression does not care whether the contents are text or
> >
> > binary, so if it is not compressed well, so it can be a sign that
> >
> > the contents are not compressible to begin with, at least with the
> >
> > xdelta binary-diff-patch engine we use. Improvement designs,
> >
> > algorithms and patches are always welcome ;-)
>
> To expand on this, if what you're storing is already compressed, like
>
> Ogg Vorbis files or PNGs, like are found in that repository, then
>
> generally they will not delta well. This is also true of things like
>
> Microsoft Office or OpenOffice documents, because they're essentially
>
> Zip files.
>
> The delta algorithm looks for similarities between files to compress
>
> them. If a file is already compressed using something like Deflate,
>
> used in PNGs and Zip files, then even very similar files will generally
>
> look very different, so deltification will generally be ineffective.
>
> There are two main solutions to this. One is to store your data
>
> uncompressed in the repository and compress it as part of a build step.
>
> This makes your checkouts larger, but it makes your repository smaller.
>
> The other is to store them outside of the repository proper. Some folks
>
> use Git LFS for this, but you could also just store a manifest with file
>
> names and secure hashes, plus a download location for a public server.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> brian m. carlson (he/him or they/them)
>
> Toronto, Ontario, CA




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux