Re: [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`

Patrick Steinhardt <ps@xxxxxx> · Wed, 3 Apr 2024 15:10:11 +0200

On Tue, Apr 02, 2024 at 11:52:58PM -0500, Justin Tobler wrote:
> On 24/03/27 07:37AM, Patrick Steinhardt wrote:
> > The table iterator allows the caller to iterate through all records in a
> > reftable table. To do so it iterates through all blocks of the desired
> > type one by one, where for each block it creates a new block iterator
> > and yields all its entries.
> > 
> > One of the things that is somewhat confusing in this context is who owns
> > the block reader that is being used to read the blocks and pass them to
> > the block iterator. Intuitively, as the table iterator is responsible
> > for iterating through the blocks, one would assume that this iterator is
> > also responsible for managing the lifecycle of the reader. And while it
> > somewhat is, the block reader is ultimately stored inside of the block
> > iterator.
> > 
> > Refactor the code such that the block reader is instead fully managed by
> > the table iterator. Instead of passing the reader to the block iterator,
> > we now only end up passing the block data to it. Despite clearing up the
> > lifecycle of the reader, it will also allow for better reuse of the
> > reader in subsequent patches.
> > 
> > The following benchmark prints a single matching ref out of 1 million
> > refs. Before:
> > 
> >   HEAP SUMMARY:
> >       in use at exit: 13,603 bytes in 125 blocks
> >     total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
> > 
> > After:
> > 
> >   HEAP SUMMARY:
> >       in use at exit: 13,603 bytes in 125 blocks
> >     total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
> > 
> > Note that while there are more allocation and free calls now, the
> > overall number of bytes allocated is significantly lower. The number of
> > allocations will be reduced significantly by the next patch though.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ...
> > @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> >  int block_iter_next(struct block_iter *it, struct reftable_record *rec)
> >  {
> >  	struct string_view in = {
> > -		.buf = it->br->block.data + it->next_off,
> > -		.len = it->br->block_len - it->next_off,
> > +		.buf = (unsigned char *) it->block + it->next_off,
> 
> Would it be best to use the `uint8_t *` type instead of `unsigned char *`
> to match `string_view.buf`? Not sure if it matters in this case.

It doesn't really. `uint8_t` should always be equivalent to `unsigned
char`. I'd also think that `unsigned char` is a bit more idiomatic in
our codebase, but may be mistaken there.

> > +		.len = it->block_len - it->next_off,
> >  	};
> ...  
> > diff --git a/reftable/block.h b/reftable/block.h
> > index 601a1e0e89..b41efa5042 100644
> > --- a/reftable/block.h
> > +++ b/reftable/block.h
> > @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
> >  void block_reader_release(struct block_reader *br);
> >  
> >  /* Returns the block type (eg. 'r' for refs) */
> > -uint8_t block_reader_type(struct block_reader *r);
> > +uint8_t block_reader_type(const struct block_reader *r);
> >  
> >  /* Decodes the first key in the block */
> > -int block_reader_first_key(struct block_reader *br, struct strbuf *key);
> > +int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
> >  
> >  /* Iterate over entries in a block */
> >  struct block_iter {
> >  	/* offset within the block of the next entry to read. */
> >  	uint32_t next_off;
> > -	struct block_reader *br;
> > +	const unsigned char *block;
> 
> Same question here. Would it be better to use `uint8_t *`? Or does it not
> really matter?

Same answer :)

> > +	size_t block_len;
> > +	int hash_size;
> >  
> >  	/* key for last entry we read. */
> >  	struct strbuf last_key;
> > @@ -106,17 +108,22 @@ struct block_iter {
> >  }
> >  
> >  /* Position `it` at start of the block */
> > -void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
> > +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
> >  
> >  /* Position `it` to the `want` key in the block */
> > -int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
> > +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
> >  			struct strbuf *want);
> >  
> > -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
> > +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
> >  
> >  /* return < 0 for error, 0 for OK, > 0 for EOF. */
> >  int block_iter_next(struct block_iter *it, struct reftable_record *rec);
> >  
> > +/*
> > + * Reset the block iterator to pristine state without releasing its memory.
> > + */
> 
> Do we want to make the comment a single line to match the other adjacent
> examples?

Can do. I'll refrain from sending a new version of this patch series for
now though as none of the comments really need addressing, in my
opinion. But please, feel free to push back in case you disagree.

Patrick

> > +void block_iter_reset(struct block_iter *it);
> > +
> >  /* deallocate memory for `it`. The block reader and its block is left intact. */
> >  void block_iter_close(struct block_iter *it);
> >  
> ...
> 
> -Justin
Attachment:
signature.asc

Description: PGP signature