From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> While updating the microsoft/git fork on top of v2.26.0-rc0 and consuming that build into Scalar, I noticed a corner case bug around partial clone. The "scalar clone" command can create a Git repository with the proper config for using partial clone with the "blob:none" filter. Instead of calling "git clone", it runs "git init" then sets a few more config values before running "git fetch". In our builds on v2.26.0-rc0, we noticed that our "git fetch" command was failing with error: https://github.com/microsoft/scalar did not send all necessary objects This does not happen if you copy the config file from a repository created by "git clone --filter=blob:none <url>", but it does happen when adding the config option "core.logAllRefUpdates = true". By debugging, I was able to see that the loop inside check_connnected() that checks if all refs are contained in promisor packs actually did not have any packfiles in the packed_git list. I'm not sure what corner-case issues caused this config option to prevent the reprepare_packed_git() from being called at the proper spot during the fetch operation. This approach requires a situation where we use the remote helper process, which makes it difficult to test. It is possible to place a reprepare_packed_git() call in the fetch code closer to where we receive a pack, but that leaves an opening for a later change to re-introduce this problem. Further, a concurrent repack operation could replace the pack-file list we already loaded into memory, causing this issue in an even harder to reproduce scenario. It is really the responsibility of anyone looping through the list of pack-files for a certain object to fall back to reprepare_packed_git() on a fail-to-find. The loop in check_connected() does not have this fallback, leading to this bug. We _could_ try looping through the packs and only reprepare the packs after a miss, but that change is more involved and has little value. Since this case is isolated to the case when opt->check_refs_are_promisor_objects_only is true, we are confident that we are verifying the refs after downloading new data. This implies that calling reprepare_packed_git() in advance is not a huge cost compared to the rest of the operations already made. Helped-by: Jeff King <peff@xxxxxxxx> Helped-by: Junio Hamano <gitster@xxxxxxxxx> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> --- connected.c: reprepare packs for corner cases I included how I found this (integrating v2.26.0-rc0 into Scalar), but I am able to reproduce it on my Linux machine using real fetches from github.com. I'm not sure why I was unable to reproduce the issue in test cases using the file:// URLs or the HTTP tests. Update in V2: I've updated the commit message to discuss the options presented on-list, but also provide why I'm keeping the code unchanged in light of that discussion. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-579%2Fderrickstolee%2Ffetch-reprepare-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-579/derrickstolee/fetch-reprepare-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/579 Range-diff vs v1: 1: cde7aa20ca8 ! 1: 696a51bd5a0 connected.c: reprepare packs for corner cases @@ -27,17 +27,31 @@ I'm not sure what corner-case issues caused this config option to prevent the reprepare_packed_git() from being called at the proper - spot during the fetch operation. Even worse, I have failed to create - a test case to prevent a regression. + spot during the fetch operation. This approach requires a situation + where we use the remote helper process, which makes it difficult to + test. - Placing a reprepare_packed_git() call inside chck_connected() before - looping through the packed_git list seems like the safest way to - avoid this issue in the future. While reprepare_packed_git() does - another scan of the pack directory, it is not terribly expensive as - long as we do not run it in a loop. We check connectivity only a - few times per command, so this will not have a meaningful performance - impact. In exchange, we get extra safety around this check. + It is possible to place a reprepare_packed_git() call in the fetch code + closer to where we receive a pack, but that leaves an opening for a + later change to re-introduce this problem. Further, a concurrent repack + operation could replace the pack-file list we already loaded into + memory, causing this issue in an even harder to reproduce scenario. + It is really the responsibility of anyone looping through the list of + pack-files for a certain object to fall back to reprepare_packed_git() + on a fail-to-find. The loop in check_connected() does not have this + fallback, leading to this bug. + + We _could_ try looping through the packs and only reprepare the packs + after a miss, but that change is more involved and has little value. + Since this case is isolated to the case when + opt->check_refs_are_promisor_objects_only is true, we are confident that + we are verifying the refs after downloading new data. This implies that + calling reprepare_packed_git() in advance is not a huge cost compared to + the rest of the operations already made. + + Helped-by: Jeff King <peff@xxxxxxxx> + Helped-by: Junio Hamano <gitster@xxxxxxxxx> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> diff --git a/connected.c b/connected.c connected.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/connected.c b/connected.c index 7e9bd1bc622..ac52b07b474 100644 --- a/connected.c +++ b/connected.c @@ -61,7 +61,11 @@ int check_connected(oid_iterate_fn fn, void *cb_data, * object is a promisor object. Instead, just make sure we * received, in a promisor packfile, the objects pointed to by * each wanted ref. + * + * Before checking for promisor packs, be sure we have the + * latest pack-files loaded into memory. */ + reprepare_packed_git(the_repository); do { struct packed_git *p; base-commit: b4374e96c84ed9394fed363973eb540da308ed4f -- gitgitgadget