Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/21/2023 1:56 PM, Jeff King wrote:
> On Mon, Mar 20, 2023 at 04:02:49PM -0400, Taylor Blau wrote:
> 
>> The pack-bitmap internals use a combination of a `map` and `map_pos`
>> variable to keep a pointer to a memory mapped region of the `.bitmap`
>> file, as well as the position within that file, respectively.
>>
>> Reads within the memory mapped region are meant to mimic file reads,
>> where each read adjusts the offset of the file descriptor accordingly.
>> This functionality is implemented by adjusting the `map_pos` variable
>> after each read.
>>
>> Factor out a function similar to seek() which adjusts the `map_pos`
>> variable for us. Since the bitmap code only needs to adjust relative to
>> the beginning of the file as well as the current position, we do not
>> need to support any "whence" values outside of SEEK_SET and SEEK_CUR.
> 
> I like the idea of centralizing the bounds checks here.
> 
> I do think copying lseek() is a little awkward, though. As you note, we
> only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
> else. Which means we have a run-time check, rather than a compile time
> check. If we had two functions like:
> 
>   void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
>   {
> 	bitmap_git->map_pos = pos;
> 	if (bitmap_git->map_pos >= bitmap_git->map_size)
> 	   ...etc...
>   }
> 
>   void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
>   {
> 	bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
>   }
> 
> then the compiler can see all cases directly. I dunno how much it
> matters.

Whenever the compiler can help us, I'm usually in favor.

In this case, I'd call them bitmap_index_seek() and bitmap_index_increment(),
which should be clear enough.

The other alternative would be to use an enum instead of an arbitrary int.
The compiler can warn to some extent, but it's not as helpful as a method
name distinction.

>> +	if (bitmap_git->map_pos >= bitmap_git->map_size)
>> +		BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")",
>> +		    (uintmax_t)bitmap_git->map_pos,
>> +		    (uintmax_t)bitmap_git->map_size);
> 
> This uses ">=", which is good, because it is valid to walk the pointer
> to one past the end of the map, which is effectively EOF. But as above,
> in that condition the callers should be checking for this EOF state
> before reading the bytes.

In other words, it would be too easy for a strange data shape to trigger
this BUG() statement, which should be reserved for the most-extreme cases
of developer mistake. Things like "this is an unacceptable 'whence' value"
or "this should never be NULL" make sense, but this is too likely to be
hit due to a runtime error.

Would it make sense to return an 'int' instead of the size_t of map_pos?
That way we could return in error if this is exceeded, and then all
callers can respond "oh wait, that move would exceed the file size, so
I should fail in my own way..."?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux