Re: [PATCH v2 02/13] reftable: define the public API

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 12 Oct 2020 09:57:30 -0700

Han-Wen Nienhuys wrote:
> On Fri, Oct 2, 2020 at 5:58 AM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:

>> Adding a header in a separate patch from the implementation doesn't
>> match the usual practice.  Can we add declarations in the same patch
>> as the functions being declared instead?
>
> We could, but it would considerably complicate work on this patch
> series, as the commit boundary then doesn't fall on file boundaries
> anymore. Would you be open to having multiple headers for the public
> interface? eg. reftable-record.h, reftable-reader.h etc. ?

I'd be more than open to it: I think that would make for a clearer API.

For example, a caller like refs.c wouldn't have any need to use
reftable-block.h because that's a lower-level detail that operates
behind the scenes when operating on a stack of reftables.  So I think
this would be a good step toward addressing the feedback Jonathan gave
about the set of functions exposed by the API being overwhelming.

>>> +     /* Misuse of the API:
>>> +        - on writing a record with NULL refname.
>>> +        - on writing a reftable_ref_record outside the table limits
>>> +        - on writing a ref or log record before the stack's next_update_index
>>> +        - on writing a log record with multiline message with
>>> +        exact_log_message unset
>>> +        - on reading a reftable_ref_record from log iterator, or vice versa.
>>> +     */
>>> +     REFTABLE_API_ERROR = -6,
>>
>> Should these call BUG()?  Or is it useful for some callers to be able
>> to recover from these errors?
>
> Since this was written as a standalone library, it's up to the caller
> to decide what should be done.

I'm not strongly opinionated about this, but just a quick note: this
implies that the library would need to make sure it is producing a
valid state in error cases.

Does the API documentation describe what state a handle is in after
an error --- e.g., what operations are permitted after that?

>>> +
>>> +/*
>>> + * Convert the numeric error code to an equivalent errno code.
>>> + */
>>> +int reftable_error_to_errno(int err);
>>
>> What is the intended use of this function?
>
> The read_raw_ref method in the ref backend API uses errno values as
> out-of-band communication mechanism.

Could we change that?  It sounds error-prone.

[...]
>>> +/* Set the range of update indices for the records we will add.  When
>>> +   writing a table into a stack, the min should be at least
>>> +   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
>>> +
>>> +   For transactional updates, typically min==max. When converting an existing
>>> +   ref database into a single reftable, this would be a range of update-index
>>> +   timestamps.
>>> + */
>>> +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>>> +                             uint64_t max);
>>
>> What happens if I write to a reftable_writer without setting limits
>> first?
>
> you get an REFTABLE_API_ERROR.

(To be clear, with all of these questions, I am asking not only for my
own curiosity but so that the documentation, naming, etc can be
cleared up to help the next person who wonders the same thing.)

[...]
>> Do I pass in the 'struct block_source *' as the source arg?  If so, why
>> are these declared as void *?
>
> you pass in block_source->arg. Should struct implementations of
> polymorphic types carry a pointer to their own vtable instead?

I think a pointer to block_source would make it more self-explanatory,
yes.

(Aside: this is a difference between how Go and e.g. Java and C++
implement polymorphism.  In Go, an interface object contains a vtable
and a pointer to the object.  In Java and C++, the object contains a
vptr.)

>> Is the reason this manages the buffer instead of requiring a
>> caller-supplied buffer to support zero-copy?
>
> Log blocks are compressed, so the caller doesn't know the correct size
> to supply. By letting the block source handle the management, we can
> swap out the block read from the file for a block managed by malloc on
> decompressing a log block.

Hm, my naive assumption would have been that we'd use different
buffers for the compressed and uncompressed data.

>>> + Generic tables
>>> +
>>> + A unified API for reading tables, either merged tables, or single readers.
>>
>> Are there callers/helpers that don't know whether they want one or the
>> other?
>
> The setup with per-worktree refs means that there are two reftable
> stacks in a .git repo. In order to iterate over the entire ref space,
> you have to merge two merged tables, but the stack itslef merges a set
> of simple reftables.

I see.  That's subtle, so it seems worth documenting for the next
person reading this documentation and wondering why.

Thanks,
Jonathan