On Thu, 21 Sep 2017 13:59:43 -0400 Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: > (part 2) > > Additional overall comments on: > https://github.com/jonathantanmy/git/commits/partialclone2 > > {} I think it would help to split the blob-max-bytes filtering and the > promisor/promised concepts and discuss them independently. > > {} Then we can talk about about the promisor/promised > functionality independent of any kind of filter. The net-net is that > the client has missing objects and it doesn't matter what filter > criteria or mechanism caused that to happened. > > {} blob-max-bytes is but one such filter we should have. This > might be sufficient if the goal is replace LFS (where you rarely ever > need any given very very large object) and dynamically loading > them as needed is sufficient and the network round-trip isn't > too much of a perf penalty. > > {} But if we want to do things like a "sparse-enlistments" where > the client only needs a small part of the tree using sparse-checkout. > For example, only populating 50,000 files from a tree of 3.5M > files at HEAD, then we need a more general filtering. > > {} And as I said above, how we chose to filter should be > independent of how the client handles promisor/promised objects. I agree that they are independent. (I put everything together so that people could see how they work together, but they can be changed independently of each other.) > {} Also, if we rely strictly on dynamic object fetching to fetch > missing objects, we are effectively tethered to the server during > operations (such as checkout) that the user might not think about as > requiring a network connection. And we are forced to keep the same > limitations of LFS in that you can't prefetch and go offline (without > actually checking out to your worktree first). And we can't bulk or > parallel fetch objects. I don't think dynamic object fetching precludes any other more optimized way of fetching or prefetching - I implemented dynamic object fetching first so that we would have a fallback, but the others definitely can be implemented too. > {} I think it would also help to move the blob-max-bytes calculation > out of pack-objects.c : add_object_entry() [1]. The current code > isolates the computation there so that only pack-objects can do the > filtering. > > Instead, put it in list-objects.c and traverse_commit_list() so > that pack-objects and rev-list can share it (as Peff suggested [2] in > response to my first patch series in March). > > For example, this would let the client have a pre-checkout hook, > use rev-list to compute the set of missing objects needed for that > commit, and pipe that to a command to BULK fetch them from the server > BEFORE starting the actual checkout. This would allow the savy user > to manually run a prefetch before going offline. > > [1] > https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110 > > [2] > https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@xxxxxxxxxxxxxxxxxxxxx/ In your specific example, how would rev-list know, on the client, to include (or exclude) a large blob in its output if it does not have it, and thus does not know its size? My reason for including it in pack-objects.c is because I only needed it there and it is much simpler, but I agree that if it can be used elsewhere, we can put it in a more general place. > {} This also locks us into size-only filtering and makes it more > difficult to add other filters. In that the add_object_entry() > code gets called on an object after the traversal has decided > what to do with it. It would be difficult to add tree-trimming > at this level, for example. That is true. > {} An early draft of this type of filtering is here [3]. I hope to > push up a revised draft of this shortly. > > [3] > https://public-inbox.org/git/20170713173459.3559-1-git@xxxxxxxxxxxxxxxxx/ OK - I'll take a look when that is done (I think I commented on an earlier version on that).