On Thu, May 23, 2024 at 06:40:00AM -0400, Jeff King wrote: > OK, so I think this commit is getting into the meat of how the new > bitmaps will be used. Just to restate it from a high-level to make sure > I understand, I think it is: > > 1. When we are traversing (or even before we traverse and just know > our tips), we can always say "hey, I have a commit in the bitmap; > does this satisfy any pseudo-merges?". Where "satisfy" is "all of > the commits pseudo-merged for that bitmap are already in our > result". And if so, then we can use the pseudo-merge bitmap by > OR-ing it in. > > And that's apply_pseudo_merges_for_commit(). > > 2. That "OR" operation may likewise open up new options, so we > recurse. And that's the "cascade" function. Exactly. I think implicit in the above is that your (2) is also a recursive step, since each cascade step may open us up to new pseudo-merges, which themselves may reach objects which satisfy other pseudo-merges, and so on. > > +static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm, > > + struct pseudo_merge_commit_ext *ext, size_t at) > > +{ > > + if (at >= pm->map_size) > > + return error(_("extended pseudo-merge read out-of-bounds " > > + "(%"PRIuMAX" >= %"PRIuMAX")"), > > + (uintmax_t)at, (uintmax_t)pm->map_size); > > + > > + ext->nr = get_be32(pm->map + at); > > + ext->ptr = pm->map + at + sizeof(uint32_t); > > + > > + return 0; > > +} > > I was happy to see the boundary check here. Do we need a length check, > too? We'd need at least four bytes here for the uint32_t. Does map_size > include the trailing hash? If not, then it might provide a bit of slop > (we'd read garbage, but never go outside the mmap). > > I guess the ">=" in the size check implies that we have at least one > byte, but I don't think anything promises that we're correctly 4-byte > aligned. Yeah, we could read into the trailing hash area, which would just be garbage from our perspective. But I think that adding a length check is easy enough to do, something like: --- 8< --- diff --git a/pseudo-merge.c b/pseudo-merge.c index b539791396..7d13101149 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -478,6 +478,10 @@ static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm, return error(_("extended pseudo-merge read out-of-bounds " "(%"PRIuMAX" >= %"PRIuMAX")"), (uintmax_t)at, (uintmax_t)pm->map_size); + if (at + 4 >= pm->map_size) + return error(_("extended pseudo-merge entry is too short " + "(%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)(at + 4), (uintmax_t)pm->map_size); ext->nr = get_be32(pm->map + at); ext->ptr = pm->map + at + sizeof(uint32_t); --- >8 --- > The rest of the length check is here: > > > +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm, > > + struct pseudo_merge *merge) > > +{ > > + if (!merge->loaded_commits) > > + BUG("cannot use unloaded pseudo-merge bitmap"); > > + > > + if (!merge->loaded_bitmap) { > > + size_t at = merge->bitmap_at; > > + > > + merge->bitmap = read_bitmap(pm->map, pm->map_size, &at); > > + merge->loaded_bitmap = 1; > > + } > > + > > + return merge->bitmap; > > +} > > When we call read_bitmap(), it knows where the end is, and it's > careful to avoid reading past it. Good. Yep, thanks for double checking. Thanks, Taylor