On 24/03/27 07:37AM, Patrick Steinhardt wrote: > The table iterator allows the caller to iterate through all records in a > reftable table. To do so it iterates through all blocks of the desired > type one by one, where for each block it creates a new block iterator > and yields all its entries. > > One of the things that is somewhat confusing in this context is who owns > the block reader that is being used to read the blocks and pass them to > the block iterator. Intuitively, as the table iterator is responsible > for iterating through the blocks, one would assume that this iterator is > also responsible for managing the lifecycle of the reader. And while it > somewhat is, the block reader is ultimately stored inside of the block > iterator. > > Refactor the code such that the block reader is instead fully managed by > the table iterator. Instead of passing the reader to the block iterator, > we now only end up passing the block data to it. Despite clearing up the > lifecycle of the reader, it will also allow for better reuse of the > reader in subsequent patches. > > The following benchmark prints a single matching ref out of 1 million > refs. Before: > > HEAP SUMMARY: > in use at exit: 13,603 bytes in 125 blocks > total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated > > After: > > HEAP SUMMARY: > in use at exit: 13,603 bytes in 125 blocks > total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated > > Note that while there are more allocation and free calls now, the > overall number of bytes allocated is significantly lower. The number of > allocations will be reduced significantly by the next patch though. > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> ... > @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src) > int block_iter_next(struct block_iter *it, struct reftable_record *rec) > { > struct string_view in = { > - .buf = it->br->block.data + it->next_off, > - .len = it->br->block_len - it->next_off, > + .buf = (unsigned char *) it->block + it->next_off, Would it be best to use the `uint8_t *` type instead of `unsigned char *` to match `string_view.buf`? Not sure if it matters in this case. > + .len = it->block_len - it->next_off, > }; ... > diff --git a/reftable/block.h b/reftable/block.h > index 601a1e0e89..b41efa5042 100644 > --- a/reftable/block.h > +++ b/reftable/block.h > @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl, > void block_reader_release(struct block_reader *br); > > /* Returns the block type (eg. 'r' for refs) */ > -uint8_t block_reader_type(struct block_reader *r); > +uint8_t block_reader_type(const struct block_reader *r); > > /* Decodes the first key in the block */ > -int block_reader_first_key(struct block_reader *br, struct strbuf *key); > +int block_reader_first_key(const struct block_reader *br, struct strbuf *key); > > /* Iterate over entries in a block */ > struct block_iter { > /* offset within the block of the next entry to read. */ > uint32_t next_off; > - struct block_reader *br; > + const unsigned char *block; Same question here. Would it be better to use `uint8_t *`? Or does it not really matter? > + size_t block_len; > + int hash_size; > > /* key for last entry we read. */ > struct strbuf last_key; > @@ -106,17 +108,22 @@ struct block_iter { > } > > /* Position `it` at start of the block */ > -void block_iter_seek_start(struct block_iter *it, struct block_reader *br); > +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br); > > /* Position `it` to the `want` key in the block */ > -int block_iter_seek_key(struct block_iter *it, struct block_reader *br, > +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br, > struct strbuf *want); > > -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src); > +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src); > > /* return < 0 for error, 0 for OK, > 0 for EOF. */ > int block_iter_next(struct block_iter *it, struct reftable_record *rec); > > +/* > + * Reset the block iterator to pristine state without releasing its memory. > + */ Do we want to make the comment a single line to match the other adjacent examples? > +void block_iter_reset(struct block_iter *it); > + > /* deallocate memory for `it`. The block reader and its block is left intact. */ > void block_iter_close(struct block_iter *it); > ... -Justin