On Mon, May 08, 2023 at 04:53:35PM -0400, Derrick Stolee wrote: > On 5/8/2023 1:38 PM, Taylor Blau wrote: > > > - /* > > - * if we have a HAVES list, but none of those haves is contained > > - * in the packfile that has a bitmap, we don't have anything to > > - * optimize here > > - */ > > - if (haves && !in_bitmapped_pack(bitmap_git, haves)) > > - goto cleanup; > > + use_boundary_traversal = git_env_bool(GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL, -1); > > + if (use_boundary_traversal < 0) { > > + prepare_repo_settings(revs->repo); > > + use_boundary_traversal = revs->repo->settings.pack_use_bitmap_boundary_traversal; > > + } > > + > > + if (!use_boundary_traversal) { > > + /* > > + * if we have a HAVES list, but none of those haves is contained > > + * in the packfile that has a bitmap, we don't have anything to > > + * optimize here > > + */ > > + if (haves && !in_bitmapped_pack(bitmap_git, haves)) > > + goto cleanup; > > + } > > I was reading along, nodding my head, until I came across this comment. > I recognize that it's moved from an existing place, but this seems very > strange and incorrect. > > Is this implying that if the 'haves' are not in the bitmapped pack, then > we _can't_ construct a bitmap representing the objects they can reach? > > Do we not have the ability to extend the object order beyond the bitmapped > pack in-memory in a way that allows us to provide bit positions for the > objects not in the bitmapped pack? It is a strange heuristic indeed, and I had the same thought as you the first time I remember encountering this part of the code. in_bitmapped_pack() asks whether there is at least one of the given set of objects which appears in the pack/MIDX corresponding with our bitmap. In other words, this heuristic causes us to bail when none of the objects listed as haves appear in the corresponding pack/MIDX. Strictly speaking, there isn't a fundamental limitation preventing us from using the bitmap traversal in cases where none of the tips exist in the bitmapped pack/MIDX. That's what the extended index is for (see `bitmap_position_extended()` and `ext_index_add_object()` for more details). AFAICT, the heuristic here is another way of saying, "if none of these haves appear in the corresponding pack or MIDX, my bitmap coverage is likely bad enough that it's not worth trying to construct a full bitmap on the haves side (and we should fall back to the ordinary object traversal)". > I can understand saying "it might be more expensive to construct a bitmap > here, because we need to walk into the bitmapped pack before we have a > hope of hitting a bitmap." However, that's far from "we don't have anything > to optimize". Agreed. > Something to think about removing in the future, but not a blocker for > this series. Getting it out of the way for the boundary-based walk makes > even more sense because the commits to check are those in the boundary, > not the haves themselves. Yeah. I think that what we do here will end up depending on the performance of the boundary based approach. If it's a clear enough win in a majority of cases, I'd just as soon drop the classic traversal (and with it, this heuristic). But if we do end up keeping the classic traversal around for a while, I absolutely agree that we should investigate and consider dropping this optimization, which I have long-suspected as not being very useful. > > +test_expect_success 'boundary-based traversal is used when requested' ' > > + git repack -a -d --write-bitmap-index && > > + > > + for argv in \ > > + "git -c pack.useBitmapBoundaryTraversal=true" \ > > + "git -c feature.experimental=true" \ > > + "GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 git" > > + do > > + eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \ > > + --use-bitmap-index second..other 2>perf" && > > + grep "\"region_enter\".*\"label\":\"haves/boundary\"" perf || > > + return 1 > > + done && > > + > > + for argv in \ > > + "git -c pack.useBitmapBoundaryTraversal=false" \ > > + "git -c feature.experimental=true -c pack.useBitmapBoundaryTraversal=false" \ > > + "GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c pack.useBitmapBoundaryTraversal=true" \ > > + "GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c feature.experimental=true" > > This ordering (GIT_TEST_*=0 overrides config) seems backwards to me, but > it doesn't really matter since it's a GIT_TEST_* variable. Thanks for > including tests so the order is documented. Now that you mention it, I think it's backwards too ;-). But I agree that documenting it here is sufficient (and ideally, we don't have to live with this GIT_TEST variable for more than a release or two, anyway). > > + do > > + eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \ > > + --use-bitmap-index second..other 2>perf" && > > + grep "\"region_enter\".*\"label\":\"haves/classic\"" perf || > > + return 1 > > nit: you should be able to use 'test_region' here. Probably not worth > a re-roll, as everything else looks good to me. Ah, I didn't know of `test_region` before. I'll keep it in mind, thanks! Thanks, Taylor