On Thu, Jan 30, 2014 at 03:45:38PM -0500, Jeff King wrote: > Either way, we should perhaps be more careful in the bitmap code, too, > that the values we get are sensible. It's better to die("your bitmap is > broken") than to read off the end of the array. I can't seem to trigger > the same failure mode, though. On my x86 system, turning off the > endian-swap (i.e., the opposite of what should happen) makes t5310 fail, > but it is because we end up trying to set the bit very far into a > dynamic bitfield, and die allocating memory. I think we could do this with something like the patch below, which checks two things: 1. When we expand the ewah, it has the same number of bits we claimed in the on-disk header. 2. The ewah header matches the number of objects in the packfile. The first catches a corruption in the ewah data itself, and the latter when the header is corrupted. You can test either by breaking the endian-swapping. :) Vicent, can you confirm my assumptions about the round-to-nearest-64 in the patch below? I assume that the bit_size on-disk may be rounded in some cases (and it is -- if you take out the rounding, this breaks things). Is that sane? Or should the on-disk ewah bit_size header always match the number of objects in the pack, and our writer is just being sloppy? diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index 9ced2da..a8f77cf 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -343,6 +343,18 @@ int ewah_iterator_next(eword_t *next, struct ewah_iterator *it) if (it->pointer >= it->buffer_size) return 0; + /* + * If we return more bits than the ewah advertised, then either + * our data bits or the bit_size field was corrupted, and we + * risk a caller overwriting their own buffer (if they used + * bit_size to size their buffer in the first place). + * + * We don't have a good way of returning an error here, so let's + * just die. + */ + if (!it->words_remaining--) + die("ewah bitmap contains more bits than it claims"); + if (it->compressed < it->rl) { it->compressed++; *next = it->b ? (eword_t)(~0) : 0; @@ -371,6 +383,8 @@ void ewah_iterator_init(struct ewah_iterator *it, struct ewah_bitmap *parent) it->buffer_size = parent->buffer_size; it->pointer = 0; + it->words_remaining = (parent->bit_size + 63) / 64; + it->lw = 0; it->rl = 0; it->compressed = 0; diff --git a/ewah/ewok.h b/ewah/ewok.h index 43adeb5..a3f49de 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -144,6 +144,7 @@ struct ewah_iterator { size_t buffer_size; size_t pointer; + size_t words_remaining; eword_t compressed, literals; eword_t rl, lw; int b; diff --git a/pack-bitmap.c b/pack-bitmap.c index ae0b57b..a31e529 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -118,6 +118,7 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) */ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) { + size_t expected_bits; struct ewah_bitmap *b = ewah_pool_new(); int bitmap_size = ewah_read_mmap(b, @@ -130,6 +131,31 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) return NULL; } + /* + * It's OK for us to have too fewer bits than objects, as the EWAH + * writer may have simply left off an ending that is all-zeroes. + * + * However it's not OK for us to have too many bits, as that would + * entail touching objects that we don't have. We are careful + * enough to avoid doing so in later code, but in the case of + * nonsensical values, we would want to avoid even allocating + * memory to hold the expanded bitmap. + * + * There is one exception: we may "go over" to round up to the next + * 64-bit ewah word, since the storage comes in chunks of that size. + */ + expected_bits = index->pack->num_objects; + if (expected_bits & 63) { + expected_bits &= ~63; + expected_bits += 64; + } + if (b->bit_size > expected_bits) { + error("unexpected number of bits in bitmap: %"PRIuMAX" > %"PRIuMAX, + (uintmax_t)b->bit_size, (uintmax_t)expected_bits); + ewah_pool_free(b); + return NULL; + } + index->map_pos += bitmap_size; return b; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html