On 1/23/2023 1:03 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Derrick Stolee <derrickstolee@xxxxxxxxxx> >> >> When unbundling a bundle, the verify_bundle() method checks two things >> with regards to the prerequisite commits: >> >> 1. Those commits are in the object store, and >> 2. Those commits are reachable from refs. >> >> During testing of the bundle URI feature, where multiple bundles are >> unbundled in the same process, the ref store did not appear to be >> refreshing with the new refs/bundles/* references added within that >> process. This caused the second half -- the reachability walk -- report >> that some commits were not present, despite actually being present. >> >> One way to attempt to fix this would be to create a way to force-refresh >> the ref state. That would correct this for these cases where the >> refs/bundles/* references have been updated. However, this still is an >> expensive operation in a repository with many references. >> >> Instead, optionally allow callers to skip this portion by instead just >> checking for presence within the object store. Use this when unbundling >> in bundle-uri.c. > > This step is new in this round. > > I am assuming that this approach is to avoid repeated "now we > unbundled one, let's spend enormous cycles to update the in-core > view of the ref store before processing the next bundle"---instead > we unbundle all, assuming the prerequisites for each and every > bundle are satisfied. We are specifically removing the requirement that the objects are reachable from refs, we still check that the objects are in the object store. Thus, we can only be in a bad state afterwards if the required objects for a bundle were in the object store, previously unreachable, and one of these two things happened: 1. Some objects reachable from those required commits were already missing in the repository (so the repo's object store was broken but only for some unreachable objects). 2. A GC pruned those objects between verifying the bundle and writing the refs/bundles/* refs after unbundling. I think we should trust the repository to not be in the first state, but the race condition in the second option will create a state where we have missing objects that are now reachable from refs. > I am OK as long as we check the assumption holds true at the end; > this looks like a good optimization. So are you recommending that we verify all objects reachable from the new refs/bundles/* are present after unbundling? That would prevent the possibility of a GC race, but at some significant run- time performance costs. Do we do the same as we unpack from a fetch? Do we apply the same scrutiny to the objects during unbundling as we do from a fetched pack? They both use 'git index-pack --stdin --fix-thin', so my guess is that we do the same amount of checks for both cases. Since this is only one of multiple ways to add objects that depend on possibly-unreachable objects, my preferred way to avoid the GC race is by using care in the GC process itself (such as the new --expire-to option to recover from these races). Thanks, -Stolee