Hi, On Thu, Oct 21, 2021 at 1:56 PM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > This session was led by Elijah Newren. Supporting cast: Johannes "Dscho" > Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson, > Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey, > Taylor Blau, and Philip Oakley. > > Notes: > ... > > * Biggest idea: there are a lot of people who version control things via > tarballs or .zip files per version. This prevents history from > compressing well. Some people check in those compressed files into Git > for purposes of history. > ... > > * Old suggestion of a “blob-tree” type that allows storing a single > index entry that corresponds to multiple trees and blobs in the > background, possibly. > > * One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to > store large binary files in a tree-structured way that can store > common regions as deltas, improve random access, parallelized > hashing. Involves a consistent way to split the file into stable > pieces, like --rsyncable uses (based on a rolling hash being zero). > > * Peff: you can do that at the object model layer or at the storage > layer. The latter is less invasive. > > * jrnieder: The benefits of blobtree are greater at the object model > layer --- e.g. not having to transmit chunks over the wire that you > already have. I think the main obstacle has been that the benefits > haven’t been enough to be worth the complexity. If that changes, we > can imagine bundling it with some other object format changes, e.g. > putting blob sizes in tree objects, and rolling it out as a new > object-format. > I think this was implemented as 'Blob Ref' in Yandex's vcs named Arc. I was suggesting this to Gitlab folks earlier (1) as a possible solution to large file storage. Very glad to hear that it was brought up during the summit. Cheers, Son Luong. (1): https://gitlab.com/gitlab-org/git/-/issues/93