On 03/15/2017 10:59 AM, Junio C Hamano wrote:
By "SHA-1s for which it wants blobs", you mean that "want" only allows one exact blob object name? I think it is necessary to support that mode of operation as a base case, and it is a good starting point. When you know - you have a "partial" clone that initially asked to contain only blobs that are smaller than 10MB, and - you are now trying to do a "git checkout v1.0 -- this/directory" so that the directory is fully populated instead of enumerating all the missing blobs from the output of "ls-tree -r v1.0 this/directory" on separate "want" requests, you may want to say "I want all the blobs that are not smaller than 10MB in this tree object $(git rev-parse v1.0:this/directory)". I am not saying that you should add something like this right away, but I am wondering how you would extend the proposed system to do so. Would you add "fetch-size-limited-blob-in-tree-pack" that runs parallel to "fetch-blob-pack" request? Would you add a new type of request packet "want-blob-with-expression" for fbp-request, which is protected by some "protocol capability" exchange? If the former, how does a client discover if a particular server already supports the new "fetch-size-limited-blob-in-tree-pack" request, so that it does not have to send a bunch of "want" request by enumerating the blobs itself? If the latter, how does a client discover if a particular server's "fetch-blob-pack" already supports the new "want-blob-with-expression" request packet?
I'm not sure if that use case is something we need to worry about (if you're downloading x * 10MB, uploading x * 50B shouldn't be a problem, I think), but if we want to handle that use case in the future, I agree that extending this system would be difficult.
The best way I can think of right now is for the client to send a fetch-blob-pack request with no "want" lines and at least one "want-tree" line, and then if there is an error (which will happen if the server is old, and therefore sees that there is not at least "want" line), to retry with the "want" lines. This allows us to add alternative ways of specifying blobs later (if we want to), but also means that upgrading a client without upgrading the corresponding server incurs a round-trip penalty.
Alternatively we could add rudimentary support for trees now and add filter-by-size later (so that such requests made to old servers will download extra blobs, but at least it works), but it still doesn't solve the general problem of specifying blobs by some other rule than its own SHA-1 or its tree's SHA-1.