Re: [WIP v2 2/2] pack-objects: support --blob-max-bytes

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Thu, 15 Jun 2017 16:28:24 -0400

On 6/2/2017 6:26 PM, Jeff King wrote:
On Fri, Jun 02, 2017 at 12:38:45PM -0700, Jonathan Tan wrote:
...
We have a name-hash cache extension in the bitmap file, but it doesn't
carry enough information to deduce the .git-ness of a file. I don't
think it would be too hard to add a "flags" extension, and give a single
bit to "this is a .git file".

I do also wonder if the two features would need to be separated for a
GVFS-style solution. If you're not just avoiding large blobs but trying
to get a narrow clone, you don't want the .git files from the
uninteresting parts of the tree. You want to get no blobs at all, and
then fault them in as they become relevant due to user action.

-Peff

I agree with Peff here.  I've been working on my partial/narrow/sparse
clone/fetch ideas since my original RFC and have come to the conclusion
that the server can do the size limiting efficiently, but we should
leave the pathname filtering to the client.  That is, let the client
get the commits and trees and then locally apply pattern matching,
whether that be a sparse-checkout definition or simple ".git*"
matching and then make a later request for the "blobs of interest".

Whether we "fault-in" the missing blobs or have a "fetch-blobs"
command (like fetch-pack) to get them is open to debate.

There are concerns about the size of the requested blob-id list in a
fetch-blobs approach, but I think there are ways to say I need all
of the blobs referenced by the directory /foo in commit xxxx (and
optionally, that aren't present in directory /foo in commit yyyy
or in the range yyyy..xxxx).  (handwave)

Jeff