Re: [PATCH v2 02/13] reftable: define the public API

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Mon, 30 Nov 2020 15:55:57 +0100

On Mon, Oct 12, 2020 at 6:57 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> > We could, but it would considerably complicate work on this patch
> > series, as the commit boundary then doesn't fall on file boundaries
> > anymore. Would you be open to having multiple headers for the public
> > interface? eg. reftable-record.h, reftable-reader.h etc. ?
>
> I'd be more than open to it: I think that would make for a clearer API.

Done.

> >> Should these call BUG()?  Or is it useful for some callers to be able
> >> to recover from these errors?
> >
> > Since this was written as a standalone library, it's up to the caller
> > to decide what should be done.
>
> I'm not strongly opinionated about this, but just a quick note: this
> implies that the library would need to make sure it is producing a
> valid state in error cases.

This is what it already does. I tried to clarify this in some places.

> Does the API documentation describe what state a handle is in after
> an error --- e.g., what operations are permitted after that?

API_ERROR should not result in state changes.

> >>> +int reftable_error_to_errno(int err);
> >>
> >> What is the intended use of this function?
> >
> > The read_raw_ref method in the ref backend API uses errno values as
> > out-of-band communication mechanism.
>
> Could we change that?  It sounds error-prone.

I moved it into refs/reftable-backend.c; I'd rather not also have to
clean up the files backend as part of this work, though.

> >> Do I pass in the 'struct block_source *' as the source arg?  If so, why
> >> are these declared as void *?
> >
> > you pass in block_source->arg. Should struct implementations of
> > polymorphic types carry a pointer to their own vtable instead?
>
> I think a pointer to block_source would make it more self-explanatory,
> yes.

This has a number of downsides, though. It means I have to introduce
separate structs strbuf_blocksource and file_blocksource, along with
functions to create and deallocate them, so it makes the whole thing
more unwieldy to use.

> >> Is the reason this manages the buffer instead of requiring a
> >> caller-supplied buffer to support zero-copy?
> >
> > Log blocks are compressed, so the caller doesn't know the correct size
> > to supply. By letting the block source handle the management, we can
> > swap out the block read from the file for a block managed by malloc on
> > decompressing a log block.
>
> Hm, my naive assumption would have been that we'd use different
> buffers for the compressed and uncompressed data.

Maybe you can comment on how to solve this differently when we get to
the implementation of block reading/writing in the commit series.

> >>> + Generic tables
> >>> +
> >>> + A unified API for reading tables, either merged tables, or single readers.
> >>
> >> Are there callers/helpers that don't know whether they want one or the
> >> other?
> >
> > The setup with per-worktree refs means that there are two reftable
> > stacks in a .git repo. In order to iterate over the entire ref space,
> > you have to merge two merged tables, but the stack itslef merges a set
> > of simple reftables.
>
> I see.  That's subtle, so it seems worth documenting for the next
> person reading this documentation and wondering why.

Done.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado