Re: Binary files

Igor Djordjevic <igor.d.djordjevic@xxxxxxxxx> · Thu, 20 Jul 2017 20:49:35 +0200

Hi Volodymyr,

On 20/07/2017 09:41, Volodymyr Sendetskyi wrote:
> It is known, that git handles badly storing binary files in its
> repositories at all.
> This is especially about large files: even without any changes to
> these files, their copies are snapshotted on each commit. So even
> repositories with a small amount of code can grove very fast in size
> if they contain some great binary files. Alongside this, the SVN is
> much better about that, because it make changes to the server version
> of file only if some changes were done.

You already got some proposals on what you could try for making large 
binary files handling easier, but I just wanted to comment on this 
part of your message, as it doesn`t seem to be correct.

Even though each repository file is included in each commit (being a 
full repository state snapshot), meaning big binary files as well, 
that`s just from an end-user`s perspective.

Actual implementation side is smarter than that - if file hasn`t 
changed between commits, it won`t get copied/written to Git object 
database again.

Under the hood, many different commits can point to the same 
(unchanged) file, thus repository size _does not_ grow very fast with 
each commit if large binary file is without any changes.

Usually, the biggest concern with Git and large files[1], in 
comparison to SVN, for example, is something else - Git model 
assuming each repository clone holding the complete repository 
history with all the different file versions included, so you can`t 
get just some of them, or the last snapshot only, keeping your local 
repository small in size.

If the repository you`re cloning from is a big one, your locally 
cloned repository will be as well, even if you may not really be 
interested in the big files at all... but you got some suggestions 
for handling that already, as pointed out :)

Just note that it`s not really Git vs SVN here, but more distributed 
vs centralized approach in general, as you can`t both have everything 
and yet skip something at the same time. Different systems may have 
different workarounds for a specific workflow, though.

[1] Besides taking each file version as a full-sized snapshot (at the 
beginning, at least, until the delta compression packing occurs).

Regards,
Buga