On Mon, Jan 15, 2024 at 02:31:12PM -0800, Linus Arver wrote: > > +static int pair_chunk_expect_fn(const unsigned char *chunk_start, > > + size_t chunk_size, > > + void *data) > > +{ > > + struct pair_chunk_data *pcd = data; > > + if (chunk_size / pcd->record_size != pcd->record_nr) > > + return -1; > > + *pcd->p = chunk_start; > > + return 0; > > +} > > + > > I don't think this function should assume anything about the inputs > (chunk_size, record_size, nor record_nr). If we are asking the helper to > be the common tool for multiple locations, it should assume even less > about what these inputs could look like. > > For example, if record_size is 0 this will lead to a > divide-by-zero. Likewise, if record_nr is zero it doesn't make > sense to check if chunk_size fits into record_size * record_nr. I'm not sure that divide-by-zero is a big deal, because 0-length fixed-size records do not make any sense to ask about. And even if somebody accidentally passed 0, even though it won't be caught by the compiler, it would barf on any input, so even rudimentary testing would catch it. A zero record_nr is a perfectly reasonable thing to ask about. If you have a chunk file with zero entries for some entity, then we are checking that the chunk is the expected zero length as well. > And even if there are no (unexpected) zero-values involved, shouldn't we > also check for nonsensical comparisons, such as if chunk_size is smaller > than record_size? Aren't we checking that already? If chunk_size is less than record_size, then the division will result in "0". If the expected number of records is not also 0, then that's a bogus file. What we really care about here is that we won't walk off the end of the buffer while looking at N fixed-size records (in that sense, the "too big" test is overly careful, but it's easy to include as a sanity check). > So in summary there appear to be the following possibilities: > > CHUNK_MISSING > CHUNK_TOO_SMALL > CHUNK_OK > CHUNK_TOO_BIG_ALIGNED > CHUNK_TOO_BIG_MISALIGNED So yes, I agree all these cases exist. We detect them all except the "misaligned" case (which I think was discussed earlier in the thread as not worth caring too much about). But... > (on top of the cases where record_* inputs are zero). > > And it seems prudent to treat each of the not-OK cases differently > (including how we report error messages) once we know which category we > fall into. I'm not sure it is worth caring too much about the different cases. From the caller's perspective the end result is always to avoid using the chunk/file. From the user's perspective, any error of the form "we expected N bytes and got X" is plenty. Especially since record_nr typically comes from another part of the file, we cannot tell if our chunk is too big/small, or if another chunk gave us a bogus record_nr. -Peff