Re: RFC on packfile URIs and .gitmodules check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/15/2021 10:22 PM, Taylor Blau wrote:
> On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>>
>>> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
>>> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
>>> (which can happen when packfile URIs are used), transfer.fsckobjects
>>> causes the fetch to fail. You can reproduce it as follows (as of the
>>> time of writing):
>>>
>>>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>>>   Cloning into 'codesearch'...
>>>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>>>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>>>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>>>   fatal: fsck error in pack objects
>>>   fatal: index-pack failed

I'm contributing a quick suggestion for just this item:

>>> This happens because the fsck part is currently being done in
>>> index-pack, which operates on one pack at a time. When index-pack sees
>>> the tree, it runs fsck on it (like any other object), and the fsck
>>> subsystem remembers the .gitmodules target (specifically, in
>>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
>>> checks if the target exists, but it doesn't, so it reports the failure.
>>
>> Is this because the gitmodules blob is contained in the base image
>> served via the pack URI mechansim, and the "dynamic" packfile for
>> the latest part of the history refers to the gitmodules file that is
>> unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.
>> You've listed two possible solutions, i.e.
>>
>>  (1) punt and declare that we assume an missing and uncheckable blob
>>      is OK,
>>
>>  (2) defer the check after transfer completes.
>>
>> Between the two, my gut feeling is that the latter is preferrable.
>> If we assume an missing and uncheckable one is OK, then even if a
>> blob is available to be checked, there is not much point in
>> checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

It might be nice to teach 'index-pack' a mode that says certain
errors should be reported as warnings by writing the problematic
OIDs to stdout/stderr. Then, the second check after all packs are
present can focus on those problematic objects instead of
re-scanning everything.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux