On 6/2/2017 6:26 PM, Jeff King wrote:
On Fri, Jun 02, 2017 at 12:38:45PM -0700, Jonathan Tan wrote:
...
We have a name-hash cache extension in the bitmap file, but it doesn't carry enough information to deduce the .git-ness of a file. I don't think it would be too hard to add a "flags" extension, and give a single bit to "this is a .git file". I do also wonder if the two features would need to be separated for a GVFS-style solution. If you're not just avoiding large blobs but trying to get a narrow clone, you don't want the .git files from the uninteresting parts of the tree. You want to get no blobs at all, and then fault them in as they become relevant due to user action. -Peff
I agree with Peff here. I've been working on my partial/narrow/sparse clone/fetch ideas since my original RFC and have come to the conclusion that the server can do the size limiting efficiently, but we should leave the pathname filtering to the client. That is, let the client get the commits and trees and then locally apply pattern matching, whether that be a sparse-checkout definition or simple ".git*" matching and then make a later request for the "blobs of interest". Whether we "fault-in" the missing blobs or have a "fetch-blobs" command (like fetch-pack) to get them is open to debate. There are concerns about the size of the requested blob-id list in a fetch-blobs approach, but I think there are ways to say I need all of the blobs referenced by the directory /foo in commit xxxx (and optionally, that aren't present in directory /foo in commit yyyy or in the range yyyy..xxxx). (handwave) Jeff