Re: [Summit topic] Crazy (and not so crazy) ideas

Son Luong Ngoc <sluongng@xxxxxxxxx> · Thu, 21 Oct 2021 14:30:50 +0200

Hi,

On Thu, Oct 21, 2021 at 1:56 PM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> This session was led by Elijah Newren. Supporting cast: Johannes "Dscho"
> Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson,
> Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey,
> Taylor Blau, and Philip Oakley.
>
> Notes:
>

...

>
> * Biggest idea: there are a lot of people who version control things via
>   tarballs or .zip files per version. This prevents history from
>   compressing well. Some people check in those compressed files into Git
>   for purposes of history.
>

...

>
>    * Old suggestion of a “blob-tree” type that allows storing a single
>      index entry that corresponds to multiple trees and blobs in the
>      background, possibly.
>
>    * One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to
>      store large binary files in a tree-structured way that can store
>      common regions as deltas, improve random access, parallelized
>      hashing. Involves a consistent way to split the file into stable
>      pieces, like --rsyncable uses (based on a rolling hash being zero).
>
>    * Peff: you can do that at the object model layer or at the storage
>      layer. The latter is less invasive.
>
>    * jrnieder: The benefits of blobtree are greater at the object model
>      layer --- e.g. not having to transmit chunks over the wire that you
>      already have. I think the main obstacle has been that the benefits
>      haven’t been enough to be worth the complexity. If that changes, we
>      can imagine bundling it with some other object format changes, e.g.
>      putting blob sizes in tree objects, and rolling it out as a new
>      object-format.
>

I think this was implemented as 'Blob Ref' in Yandex's vcs named Arc.
I was suggesting this to Gitlab folks earlier (1) as a possible solution to
large file storage.

Very glad to hear that it was brought up during the summit.

Cheers,
Son Luong.

(1): https://gitlab.com/gitlab-org/git/-/issues/93