Re: RFC on packfile URIs and .gitmodules check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > Is this because the gitmodules blob is contained in the base image
> > served via the pack URI mechansim, and the "dynamic" packfile for
> > the latest part of the history refers to the gitmodules file that is
> > unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.

In this case, yes, that is what is happening.

> > You've listed two possible solutions, i.e.
> >
> >  (1) punt and declare that we assume an missing and uncheckable blob
> >      is OK,
> >
> >  (2) defer the check after transfer completes.
> >
> > Between the two, my gut feeling is that the latter is preferrable.
> > If we assume an missing and uncheckable one is OK, then even if a
> > blob is available to be checked, there is not much point in
> > checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

That makes sense.

> > As long as the quarantine of incoming pack works correctly,
> > streaming the incoming packdata (and packfile downloaded out of line
> > via a separate mechanism like pack URI) to index-pack that does not
> > check to complete the transfer, with a separate step to check the
> > sanity of these packs as a whole, should not harm the repository
> > even if it is interrupted in the middle, after transfer is done but
> > before checking says it is OK.
> 
> Agreed. Bear in mind that I am pretty unfamiliar with this code, and so
> I'm not sure if it's 'easy' or not to change it in this way. The obvious
> downside, which Jonathan notes, is that you almost certainly have to
> reinflate all of the trees again.
> 
> But, since the user is asking for transfer.fsckObjects explicitly, I
> don't think that it's a problem.

We might be able to avoid the reinflate if we do it as part of the
connectivity check or somehow teach index-pack a way to communicate the
dangling .gitmodules links (as you suggest below).

> > As a potential third option, I wonder if it is easier for everybody
> > involved (including third-party implementation of their
> > index-pack/fsck equivalent) if we made it a rule that a pack that
> > has a tree that refers to .git<something> must include the blob for
> > it?
> 
> Interesting, but I'm sure CDN administrators would prefer to have as few
> restrictions in place as possible.

That rule would help, but it also seems inelegant in that if we put
commits that have the same .gitmodules in 2 or more different packs,
there would be identical objects across those packs (besides the reason
Taylor mentioned).

> A potential fourth option that I can think of is that we can try to
> eagerly perform the .gitmodules fsck checks as we receive objects, under
> the assumption that the .gitmoudles blob and the tree which contains it
> appear in the same pack.
> 
> If they do, then we ought to be able to check them as we currently do
> (and avoid leaving them to the slow post-processing step). Any blobs
> that we _can't_ find get placed into an array, and then that array is
> iterated over after we have received all packs, including from the CDN.
> Any blobs that couldn't be found in the pack transferred from the
> remote, the CDN, or the local repository (and isn't explicitly excluded
> via an object --filter) is declared missing.
> 
> Thoughts?

The hard part is communicating this array to the parent fetch process.
Stolee has a suggestion [1] which I will reply to directly.

[1] https://lore.kernel.org/git/d2ca2fec-a353-787a-15a7-3831a665523e@xxxxxxxxx/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux