Re: [PATCH 03/10] reftable/merged: stop using generic tables in the merged table

Justin Tobler <jltobler@xxxxxxxxx> · Mon, 19 Aug 2024 11:47:06 -0500

On 24/08/13 08:24AM, Patrick Steinhardt wrote:
> The merged table provides access to a reftable stack by merging the
> contents of those tables into a virtual table. These subtables are being
> tracked via `struct reftable_table`, which is a generic interface for
> accessing either a single reftable or a merged reftable. So in theory,
> it would be possible for the merged table to merge together other merged
> tables.
> 
> This is somewhat nonsensical though: we only ever set up a merged table
> over normal reftables, and there is no reason to do otherwise. This
> generic interface thus makes the code way harder to follow and reason
> about than really necessary. The abstraction layer may also have an
> impact on performance, even though the extra set of vtable function
> calls probably doesn't really matter.

Agreed! When I was reading through the reftable code for the first time
I remember being rather confused by this interface. It left me wondering
if merge tables could also be composed of other merge tables, but I
couldn't think of a valid reason to ever do so.

> Refactor the merged tables to use a `struct reftable_reader` for each of
> the subtables instead, which gives us direct access to the underlying
> tables. Adjust names accordingly.

Using `struct reftable_reader` directly instead of the generic `struct
reftable_table` sounds like the right decision and the flexibility
it provids is unneeded and only adds complexity.

> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
[snip] 
> diff --git a/reftable/merged.h b/reftable/merged.h
> index 2efe571da6..de5fd33f01 100644
> --- a/reftable/merged.h
> +++ b/reftable/merged.h
> @@ -12,8 +12,8 @@ license that can be found in the LICENSE file or at
>  #include "system.h"
>  
>  struct reftable_merged_table {
> -	struct reftable_table *stack;
> -	size_t stack_len;
> +	struct reftable_reader **readers;
> +	size_t readers_len;

The merged table is being made to directly reference the table readers
instead of going through the generic `struct reftable_table`. Makes
sense.

>  	uint32_t hash_id;
>  
>  	/* If unset, produce deletions. This is useful for compaction. For the
> diff --git a/reftable/reader.c b/reftable/reader.c
> index 29c99e2269..f7ae35da72 100644
> --- a/reftable/reader.c
> +++ b/reftable/reader.c
> @@ -605,9 +605,9 @@ static void iterator_from_table_iter(struct reftable_iterator *it,
>  	it->ops = &table_iter_vtable;
>  }
>  
> -static void reader_init_iter(struct reftable_reader *r,
> -			     struct reftable_iterator *it,
> -			     uint8_t typ)
> +void reader_init_iter(struct reftable_reader *r,
> +		      struct reftable_iterator *it,
> +		      uint8_t typ)
>  {
>  	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
>  
> diff --git a/reftable/reader.h b/reftable/reader.h
> index e869165f23..a2c204d523 100644
> --- a/reftable/reader.h
> +++ b/reftable/reader.h
> @@ -57,6 +57,10 @@ int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
>  void reader_close(struct reftable_reader *r);
>  const char *reader_name(struct reftable_reader *r);
>  
> +void reader_init_iter(struct reftable_reader *r,
> +		      struct reftable_iterator *it,
> +		      uint8_t typ);

At first I was wondering if we should prefix the function name with
`reftable_`, but this is only exposed as part of the internal reftable
interface and matches the format of other "reftable/reader.h" functions.