Jeff King <peff@xxxxxxxx> writes: > On Fri, Mar 24, 2023 at 02:06:43PM -0400, Derrick Stolee wrote: > >> >> + bitmap_index_seek(index, header_size, SEEK_CUR); >> >> return 0; >> >> } >> > >> > Likewise this function already has bounds checks at the top: >> > >> > if (index->map_size < header_size + the_hash_algo->rawsz) >> > return error(_("corrupted bitmap index (too small)")); >> > >> > I'd be perfectly happy if we swapped that our for checking the bounds on >> > individual reads, but the extra checking in the seek step here just >> > seems redundant (and again, too late). >> >> I think it would be nice to replace all of these custom bounds >> checks with a check within bitmap_index_seek() and error conditions >> done in response to an error code returned by that method. It keeps >> the code more consistent in the potential future of changing the >> amount to move the map_pos and the amount checked in these conditions. > > Yeah, that's what I was getting at. But doing it at seek time is too > late. We'll have just read off the end of the array. Yup. You illustrated it nicely in your response for the previous step of the series. If the typical access pattern is to check, read and then advance to the next position, and by the time you are ready to advance to the next position, you'd better have done the checking. > You really need an interface more like "make sure there are N bytes for > me to read at offset X". Then you can read and advance past them. > > For individual items where you want to copy the bytes out anyway (like a > be32) you can have a nice interface like: > > if (read_be32(bitmap_git, &commit_idx_pos) < 0 || > read_u8(bitmap_git, &xor_offset) < 0 || > read_u8(bitmap_git, &flags) < 0) > return error("truncated bitmap entry"); Yeah, that kind of flow reads really well. > But given that there is only one spot that calls these, that kind of > refactoring might not be worth it (right now it just uses the magic > number "6" right before grabbing the data). Yeah, it seems most of the callers with SEEK_SET are "I find the next offset from a table and jump there in preparation for doing something". I suspect callers with SEEK_CUR would fit in the read_X() pattern better? From that angle, it smells that the two kinds of seek functions may want to be split into two different helpers. Thanks.