Re: [PATCH] object-info: support for retrieving object info

Bruno Albuquerque <bga@xxxxxxxxxx> · Thu, 15 Apr 2021 16:06:40 -0700

On Thu, Apr 15, 2021 at 2:53 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Is the assumption that such an implementation of VFS would fetch
> individual tree object with the existing "fetch this single object
> by the object name" interface?

That is the general idea, yes.

> What I am wondering is, as an ingredient for implementing VFS layer,
> if this is a bit too low level.  To respond to "ls -l" when you only
> have a tree object name, you'd need two roundtrips, one to retrieve
> the tree, and then after parsing the tree to find out what objects
> the tree refers to with what pathname component, you'd issue the
> object-info for all of them in a single request.

Yes, as it is designed you would need to do that. There are a couple
few reasons for this implementation to do this the way it does:

1 - Although it is currently only returning object sizes, it was
designed in a way that it can be extended if need to return other
object metadata.
2 - Doing it like this is a fully backwards-compatible change and
older clients would still work without changes (just not make use of
this).
3- In a real filesystem, it is common to have multiple directories
under a tree so in practice you can optimize requests (if needed) by
retrieving several tree objects and doing a single object-info request
for all objects in those trees.

Note I am not saying it could not be done in a different way, but it
does look to me this change strikes a good balance between what it
provides and its cost,

> If a request takes a single (or multiple) tree object name, and lets
> you retrieve _both_ the tree object itself _and_ object-info for the
> objects the tree refers to, you can build "ls -l" with a single
> roundtrip instead.

That is true. But it seems to me that there is not a good place to fit
size  (and maybe other metadata eventually) information currently with
the existing protocol. But it is not impossible that I might simply be
missing something.

> I do not know how much the latency matters (or more importantly, how
> much a naïve coutner-proposal like the above would help), but it is
> what immediately came to my mind.

I do not expect the latency of the object-info request to be an issue
especially because fetching the information can be done in batches and
also prefetched in some cases (as we do not need to download the
objects, we do not need to worry about downloading possibly gigabytes
of data while just iterating through directories). But, of course,
assuming there is a clean way to make things even better, I am all for
it.

> Assuming that we are good with an interface that needs two requests
> to obtain "object contents" and "object info" separately, I find
> what in this patch quite reasonable, though (admittedly, I've
> already read this patch during internal review number of times).

FWIIW, I gave this a lot of thought and for the purposes of doing a
remote file system (and one can even optimize git ls-tree to be a lot
faster for partial clones using this), I feel confident about the
change (module missing something, of course).

Thanks for your comments.