Re: [PATCH v2 02/13] reftable: define the public API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 2, 2020 at 5:58 AM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Adding a header in a separate patch from the implementation doesn't
> match the usual practice.  Can we add declarations in the same patch
> as the functions being declared instead?

We could, but it would considerably complicate work on this patch
series, as the commit boundary then doesn't fall on file boundaries
anymore. Would you be open to having multiple headers for the public
interface? eg. reftable-record.h, reftable-reader.h etc. ?

> > +     /* Misuse of the API:
> > +        - on writing a record with NULL refname.
> > +        - on writing a reftable_ref_record outside the table limits
> > +        - on writing a ref or log record before the stack's next_update_index
> > +        - on writing a log record with multiline message with
> > +        exact_log_message unset
> > +        - on reading a reftable_ref_record from log iterator, or vice versa.
> > +     */
> > +     REFTABLE_API_ERROR = -6,
>
> Should these call BUG()?  Or is it useful for some callers to be able
> to recover from these errors?

Since this was written as a standalone library, it's up to the caller
to decide what should be done. It also simplifies testing: you can
verify that incorrect API usage returns a certain error code, but a
BUG() that prints an error or crashes the program is much harder to
verify.

> > +     /* Dir/file conflict. */
>
> This is when adding a ref?

yes.

> > +
> > +/*
> > + * Convert the numeric error code to an equivalent errno code.
> > + */
> > +int reftable_error_to_errno(int err);
>
> What is the intended use of this function?

The read_raw_ref method in the ref backend API uses errno values as
out-of-band communication mechanism.

> > +/* Set the range of update indices for the records we will add.  When
> > +   writing a table into a stack, the min should be at least
> > +   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> > +
> > +   For transactional updates, typically min==max. When converting an existing
> > +   ref database into a single reftable, this would be a range of update-index
> > +   timestamps.
> > + */
> > +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> > +                             uint64_t max);
>
> What happens if I write to a reftable_writer without setting limits
> first?

you get an REFTABLE_API_ERROR.

> > +/* block_source_vtable are the operations that make up block_source */
> > +struct reftable_block_source_vtable {
> > +     /* returns the size of a block source */
> > +     uint64_t (*size)(void *source);
> > +
> > +     /* reads a segment from the block source. It is an error to read
> > +        beyond the end of the block */
> > +     int (*read_block)(void *source, struct reftable_block *dest,
> > +                       uint64_t off, uint32_t size);
> > +     /* mark the block as read; may return the data back to malloc */
> > +     void (*return_block)(void *source, struct reftable_block *blockp);
> > +
> > +     /* release all resources associated with the block source */
> > +     void (*close)(void *source);
> > +};
>
> This is abstracting file input?  I wonder if some name emphasizing the
> 'read' operation like block_reader might be clearer.
>
> Do I pass in the 'struct block_source *' as the source arg?  If so, why
> are these declared as void *?

you pass in block_source->arg. Should struct implementations of
polymorphic types carry a pointer to their own vtable instead?

> Is the reason this manages the buffer instead of requiring a
> caller-supplied buffer to support zero-copy?

Log blocks are compressed, so the caller doesn't know the correct size
to supply. By letting the block source handle the management, we can
swap out the block read from the file for a block managed by malloc on
decompressing a log block.

> > +/* returns the hash ID used in this table. */
> > +uint32_t reftable_reader_hash_id(struct reftable_reader *r);
>
> What is a hash id?

The hash identifier, eg. "sha1" for SHA1 and "s256" for SHA-256.

> [...]
> > + Generic tables
> > +
> > + A unified API for reading tables, either merged tables, or single readers.
>
> Are there callers/helpers that don't know whether they want one or the
> other?

The setup with per-worktree refs means that there are two reftable
stacks in a .git repo. In order to iterate over the entire ref space,
you have to merge two merged tables, but the stack itslef merges a set
of simple reftables.

> [...]
> > +/* holds a transaction to add tables at the top of a stack. */
> > +struct reftable_addition;
>
> When would I add multiple tables at once to the top?  Is this used
> during compacting, for example?

No. You'd usually add only one table, but at the same time, it seemed
kind of arbitrary to allow adding only one table. FWIW, I added this
because libgit2 wants to be able to lock a ref database: creating the
reftable_addition institutes the lock.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux