Re: [PATCH] reftable/block: fix binary search over restart counter

Patrick Steinhardt <ps@xxxxxx> · Thu, 7 Mar 2024 17:24:14 +0100

On Thu, Mar 07, 2024 at 04:26:38PM +0100, Patrick Steinhardt wrote:
> Records store their keys prefix-compressed. As many records will share a
> common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
> of disk space. The downside of this is that it is not possible to just
> seek into the middle of a block and consume the corresponding record
> because it may depend on prefixes read from preceding records.
> 
> To help with this usecase, the reftable format writes every n'th record
> without using prefix compression, which is called a "restart". The list
> of restarts is stored at the end of each block so that a reader can
> figure out entry points at which to read a full record without having to
> read all preceding records.
> 
> This allows us to do a binary search over the records in a block when
> searching for a particular key by iterating through the restarts until
> we have found the section in which our record must be located. From
> thereon we perform a linear search to locate the desired record.
> 
> This mechanism is broken though. In `block_reader_seek()` we call
> `binsearch()` over the count of restarts in the current block. The
> function we pass to compare records with each other computes the key at
> the current index and then compares it to our search key by calling
> `strbuf_cmp()`, returning its result directly. But `binsearch()` expects
> us to return a truish value that indicates whether the current index is
> smaller than the searched-for key. And unless our key exactly matches
> the value at the restart counter we always end up returning a truish
> value.
> 
> The consequence is that `binsearch()` essentially always returns 0,
> indicacting to us that we must start searching right at the beginning of
> the block. This works by chance because we now always do a linear scan
> from the start of the block, and thus we would still end up finding the
> desired record. But needless to say, this makes the optimization quite
> useless.
> 
> Fix this bug by returning whether the current key is smaller than the
> searched key. As the current behaviour was correct it is not possible to
> write a test. Furthermore it is also not really possible to demonstrate
> in a benchmark that this fix speeds up seeking records.
> 
> This may cause the reader to question whether this binary search makes
> sense in the first place if it doesn't even help with performance. But
> it would end up helping if we were to read a reftable with a much larger
> block size. Blocks can be up to 16MB in size, in which case it will
> become much more important to avoid the linear scan. We are not yet
> ready to read or write such larger blocks though, so we have to live
> without a benchmark demonstrating this.
> 
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  reftable/block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/reftable/block.c b/reftable/block.c
> index 72eb73b380..3d7a7022e7 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)
>  
>  	result = strbuf_cmp(&a->key, &rkey);
>  	strbuf_release(&rkey);
> -	return result;
> +	return result <= 0;
>  }
>  
>  void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> -- 
> 2.44.0
> 

Hum. I noticed that this causes a memory leak in our CI. I'll need to
investigate.

Patrick
Attachment:
signature.asc

Description: PGP signature