On Fri, Mar 8, 2019 at 11:13 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > This is indeed a nice feature to have, and thanks for details of how > this would be accomplished. > > One issue is that when cloning a repository, we do not download many > files - we only download one dynamically generated packfile containing > all the objects we want. Since the packfile is dynamically generated specifically for a client request, and is destroyed from the server as soon as the connection between them closes. Is this the reason why we cannot pause it in between like we can do with download managers ? I read through the progit ebook 'git internels' chapter and the following thought came to me: Assume a pack file as follows: --- $ git verify-pack -v .git/objects/pack/pack- 978e03944f5c581011e6998cd0e9e30000905586.idx b042a60ef7dff760008df33cee372b945b6e884e blob 22054 5799 1463 033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob 9 20 7262 1 \ b042a60ef7dff760008df33cee372b945b6e884e .git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok --- Here 033b blob refers b042 blob, and both blobs are different versions of the same file. Before this pack was made, both of these blobs were stored separately and thus were taking more space. Packfile is made to save space, by only storing latest version and its delta with earlier version. Both delta and latest version are stored in compressed form right ? Now, here is another approach to save space without needing to create pack: Earlier both the blobs had their object files as: .git/objects/03/3b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 .git/objects/b0/42a60ef7dff760008df33cee372b945b6e884e Lets say b042 is latest and 033b is its earlier version. what git does in packfile can be done right here by: storing latest version in .git/objects/b0/42a60ef7dff760008df33cee372b945b6e884e and its delta in .git/objects/03/3b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5, with the delta version we can add a header that tells it to check for .git/objects/b0/42a60ef7dff760008df33cee372b945b6e884e and apply delta on it to get the earlier version. Doing this, eliminates the big packfile, and all the objects are spread into folders. We can now make this resume-able right ? Please point out what i missed here. Is it possible to do the above ? if yes then what was the reason to introduce concept of packfile ? > You might be interested in some work I'm doing to offload part of the > packfile response to CDNs: > > https://public-inbox.org/git/cover.1550963965.git.jonathantanmy@xxxxxxxxxx/ > > This means that when cloning/fetching, multiple files could be > downloaded, meaning that a scheme like you suggest would be more > worthwhile. (In fact, I allude to such a scheme in the design document > in patch 5.) currently reading through all the discussion on this strategy.