Re: [PATCH v2 4/7] reftable/block: refactor binary search over restart points

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/04/02 07:15PM, Patrick Steinhardt wrote:
> On Tue, Apr 02, 2024 at 11:42:29AM -0500, Justin Tobler wrote:
> > On 24/03/25 11:10AM, Patrick Steinhardt wrote:
> > > When seeking a record in our block reader we perform a binary search
> > > over the block's restart points so that we don't have to do a linear
> > > scan over the whole block. The logic to do so is quite intricate though,
> > > which makes it hard to understand.
> > > 
> > > Improve documentation and rename some of the functions and variables so
> > > that the code becomes easier to understand overall. This refactoring
> > > should not result in any change in behaviour.
> > > 
> > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> > > ---
> > ...  
> > > -	i = binsearch(br->restart_count, &restart_key_less, &args);
> > > +	/*
> > > +	 * Perform a binary search over the block's restart points, which
> > > +	 * avoids doing a linear scan over the whole block. Like this, we
> > > +	 * identify the section of the block that should contain our key.
> > > +	 *
> > > +	 * Note that we explicitly search for the first restart point _greater_
> > > +	 * than the sought-after record, not _greater or equal_ to it. In case
> > > +	 * the sought-after record is located directly at the restart point we
> > > +	 * would otherwise start doing the linear search at the preceding
> > > +	 * restart point. While that works alright, we would end up scanning
> > > +	 * too many record.
> > > +	 */
> > > +	i = binsearch(br->restart_count, &restart_needle_less, &args);
> > > +
> > > +	/*
> > > +	 * Now there are multiple cases:
> > > +	 *
> > > +	 *   - `i == 0`: The wanted record must be contained before the first
> > > +	 *     restart point. We will thus start searching for the record in
> > > +	 *     that section after accounting for the header offset.
> > 
> > To clarify my understanding, does a restart_offset not exist at the
> > beginning of a block? I was originally under the impression that the
> > first reset point would be at the beginning of the block (or just after
> > the header). If this was the case, the first restart point being greater
> > would indicate that the wanted record is not contained within the block.
> 
> It wouldn't make much sense to include it as a restart point. A restart
> point is a point where the prefix compression will be reset such that
> the record at that point can be read without reading preceding records.
> This is always implicitly true for the first record in a block as it is
> never prefix-compressed. Consequently, writing a restart point for the
> first record would be a waste of disk space.

Thanks Patrick! Good to know :)

>From Documentation/technical/reftable.txt:

> As the first ref block shares the first file block with the file
> header, all restart_offset in the first block are relative to the
> start of the file (position 0), and include the file header. This 
> forces the first restart_offset to be 28.

I'm assumming this is out-of-date.

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux