Re: RFC on packfile URIs and .gitmodules check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Would this still behave if the $DAYJOB's packfile-uri server support was
> behaving as documented in packfile-uri.txt, or just because it has
> outside-spec behavior?
> 
> I.e. the spec[1] says this:
> 
>     This is the implementation: a feature, marked experimental, that
>     allows the server to be configured by one or more
>     `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list
>     of objects to be sent is assembled, all such blobs are excluded,
>     replaced with URIs. The client will download those URIs, expecting
>     them to each point to packfiles containing single blobs.
> 
> Which I can't see leaving an opening for more than packfile-uri being to
> serve up packfiles which each contain a single blob.

I meant to leave an opening by referring to this just as a Minimum
Viable Product and by explaining in Future Work that the protocol allows
evolution of (among other things) which objects the server sends through
a URI without any protocol changes.

But in any case, this will also happen even if we constrain ourselves to
excluding single blobs and sending them via other packfiles instead -
see below.

> In that case it seems to me we'd be OK (but I haven't tested), because
> fsck_finish() will call read_object_file() which'll try to read that
> "blob from the object store when it encounters the ".gitmodules" tree,
> and because we'd have already downloaded the packfile with the blob
> before moving onto the main dialog.

We wouldn't be OK, actually. Suppose we have a separate packfile
containing only the ".gitmodules" blob - when we call fsck_finish(), we
would not have downloaded the other packfile yet. Git processes the
entire fetch response by piping the inline packfile (after demux) into
index-pack (which is the one that calls fsck_finish()) before it
downloads any of the other packfile(s).

> But as we discussed on-list before[2] this isn't the way packfile-uri
> actually works in the wild. It's really just sending some arbitrary data
> in a pack in that URI, with a server that knows what's in that pack and
> will send the rest in such a way that everything ends up being
> connected.
> 
> As far as I can tell the only reason this is called "packfile URI" and
> behaves this way in git.git is because of the convenience of
> intrumenting pack-objects.c with an "oidset excluded_by_config" to not
> stream those blobs in a pack, but it isn't how the only (I'm pretty
> sure) production server implementation in the wild behaves at all.

I don't know if this is the only production server implementation, but
yes, this particular one (googlesource.com) can put objects of multiple
types in the other packfile, not only a single blob. There is some JGit
code here [1] that can send a URI corresponding to a "CachedPack" (which
may contain all objects, not only blobs) if that pack is also available
through a URI.

[1] https://gerrit.googlesource.com/jgit/+/a004820858b54d18c6f72fc94dc33bce8b606d66

> So *poke* about the reply I had in [3] late last year. I think the first
> thing worth doing here is fixing the docs so they describe how this
> works. You didn't get back on that (and I also forgot about it until
> this thread), but it would be nice to know what you think about the
> suggested prose there.

Rereading that, the issue is that uploadpack.blobPackfileUri is indeed
how the current Git server handles it - it excludes a blob and sends a
URI instead. The client is not supposed to see how the server has
configured it, and should not be constrained by the fact that the server
that is being shipped with it only excludes single blobs.

> Re-reading it I'd add something like this to the spec:
> 
>  A. That the config is called "uploadpack.blobPackfileUri" in git.git
>     has nothing to do with how this is expected to behave on the
>     wire. It's just to serve the narrow support pack-objects.c has for
>     crafting such a pack.

Yes, that's true.

>  B. It's then called "packfile-uris" on the wire, nothing to do with
>     blobs. Just packs with a checksum that we'll validate. An older
>     versions of this spec said "[a] packfiles containing single blobs"
>     but it can be any combination of blob/tree/commit data.

Yes, we can delete that line.

>  C. A client is then expected to deal with any combination of data
>     ordered/sliced/split up etc. in any possible way from such a
>     combination of "packfile-uris" and PACK dialog, as long as the end
>     result is valid.
> 
> Except that the result of this discussion will perhaps be a more narrow
> definition for "C".

Yes. I think all these can be done just by changing the last sentence in
"Server design" - I'll send a patch.

> 1. https://github.com/git/git/blob/cd8402e0fd8cfc0ec9fb10e22ffb6aabd992eae1/Documentation/technical/packfile-uri.txt#L37-L41
> 2. https://lore.kernel.org/git/20201125190957.1113461-1-jonathantanmy@xxxxxxxxxx/
> 3. https://lore.kernel.org/git/87tut5vghw.fsf@xxxxxxxxxxxxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux