On 1/15/2021 10:22 PM, Taylor Blau wrote: > On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote: >> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: >> >>> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the >>> .gitmodules blob itself are sent in 2 separate packfiles during a fetch >>> (which can happen when packfile URIs are used), transfer.fsckobjects >>> causes the fetch to fail. You can reproduce it as follows (as of the >>> time of writing): >>> >>> $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch >>> Cloning into 'codesearch'... >>> remote: Total 2242 (delta 0), reused 2242 (delta 0) >>> Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done. >>> error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob >>> fatal: fsck error in pack objects >>> fatal: index-pack failed I'm contributing a quick suggestion for just this item: >>> This happens because the fsck part is currently being done in >>> index-pack, which operates on one pack at a time. When index-pack sees >>> the tree, it runs fsck on it (like any other object), and the fsck >>> subsystem remembers the .gitmodules target (specifically, in >>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which >>> checks if the target exists, but it doesn't, so it reports the failure. >> >> Is this because the gitmodules blob is contained in the base image >> served via the pack URI mechansim, and the "dynamic" packfile for >> the latest part of the history refers to the gitmodules file that is >> unchanged, hence the latter one lacks it? > > That seems like a likely explanation, although this seems ultimately up > to what the pack CDN serves. >> You've listed two possible solutions, i.e. >> >> (1) punt and declare that we assume an missing and uncheckable blob >> is OK, >> >> (2) defer the check after transfer completes. >> >> Between the two, my gut feeling is that the latter is preferrable. >> If we assume an missing and uncheckable one is OK, then even if a >> blob is available to be checked, there is not much point in >> checking, no? > > I'm going to second this. If this were a more benign check, then I'd > perhaps feel differently, but .gitmodules fsck checks seem to get > hardened fairly often during security releases, and so it seems > important to keep performing them when the user asked for it. It might be nice to teach 'index-pack' a mode that says certain errors should be reported as warnings by writing the problematic OIDs to stdout/stderr. Then, the second check after all packs are present can focus on those problematic objects instead of re-scanning everything. Thanks, -Stolee