On Thu, Apr 15, 2021 at 2:53 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Is the assumption that such an implementation of VFS would fetch > individual tree object with the existing "fetch this single object > by the object name" interface? That is the general idea, yes. > What I am wondering is, as an ingredient for implementing VFS layer, > if this is a bit too low level. To respond to "ls -l" when you only > have a tree object name, you'd need two roundtrips, one to retrieve > the tree, and then after parsing the tree to find out what objects > the tree refers to with what pathname component, you'd issue the > object-info for all of them in a single request. Yes, as it is designed you would need to do that. There are a couple few reasons for this implementation to do this the way it does: 1 - Although it is currently only returning object sizes, it was designed in a way that it can be extended if need to return other object metadata. 2 - Doing it like this is a fully backwards-compatible change and older clients would still work without changes (just not make use of this). 3- In a real filesystem, it is common to have multiple directories under a tree so in practice you can optimize requests (if needed) by retrieving several tree objects and doing a single object-info request for all objects in those trees. Note I am not saying it could not be done in a different way, but it does look to me this change strikes a good balance between what it provides and its cost, > If a request takes a single (or multiple) tree object name, and lets > you retrieve _both_ the tree object itself _and_ object-info for the > objects the tree refers to, you can build "ls -l" with a single > roundtrip instead. That is true. But it seems to me that there is not a good place to fit size (and maybe other metadata eventually) information currently with the existing protocol. But it is not impossible that I might simply be missing something. > I do not know how much the latency matters (or more importantly, how > much a naïve coutner-proposal like the above would help), but it is > what immediately came to my mind. I do not expect the latency of the object-info request to be an issue especially because fetching the information can be done in batches and also prefetched in some cases (as we do not need to download the objects, we do not need to worry about downloading possibly gigabytes of data while just iterating through directories). But, of course, assuming there is a clean way to make things even better, I am all for it. > Assuming that we are good with an interface that needs two requests > to obtain "object contents" and "object info" separately, I find > what in this patch quite reasonable, though (admittedly, I've > already read this patch during internal review number of times). FWIIW, I gave this a lot of thought and for the purposes of doing a remote file system (and one can even optimize git ls-tree to be a lot faster for partial clones using this), I feel confident about the change (module missing something, of course). Thanks for your comments.