Re: [PATCH v2 02/13] reftable: define the public API

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Sat, 10 Oct 2020 18:57:56 +0200

Hi Jonathan,

thanks for a detailed look at the API. I'm replying to some queries
here, but I won't have time for another 2 weeks to revisit the code.

On Thu, Oct 8, 2020 at 3:41 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> A typical user would just want to do refname->hash, refname->log, or possibly
> hash->refname lookups, so a ref record would be meant for low-level

You are forgetting refname -> symref target and refname -> peeled tag.

> users. I can't think of a typical use case, but even if such existed, I
> wouldn't expect this API - in particular, the refname is not written as
> such on disk (it is composed from a prefix and a suffix, if I remember
> correctly), so I would expect a function that one can call to write the
> refname, but not for it to appear in a data structure.

It works with records, because it is more practical for the low level
code, but also because it obviates defining many functions for the
different types of lookups and seeks.

> > +/* returns whether 'ref' represents a deletion */
> > +int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
>
> This looks unorthogonal - looking at the spec, the "value type" can
> represent a deletion, one object name (value of the ref), two object
> names (value of the ref + peeled target), or symbolic reference. If
> we're willing to allocate memory for a ref record, I think we can afford
> the extra 4 bytes (maybe 8) and use a tagged union instead.

I guess you want something like

  struct {
     enum TYPE  { REF_DEL, REF_1VAL, REF_2VAL, REF_SYM } record_type;
     char *ref_name;
     uint64_t update_index;
     union  {
          uint_8 *val1;
          struct { uint8_t *val1, *val2; } val2;
          char *sym;
     };
  };

I suspect it makes initialization more verbose, but yes, I could do that.

> > +/* returns whether two reftable_ref_records are the same */
> > +int reftable_ref_record_equal(struct reftable_ref_record *a,
> > +                           struct reftable_ref_record *b, int hash_size);
>
> Not sure what this will be used for.

This is useful for testing.

> > +/* Set the range of update indices for the records we will add.  When
> > +   writing a table into a stack, the min should be at least
> > +   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> > +
> > +   For transactional updates, typically min==max. When converting an existing
> > +   ref database into a single reftable, this would be a range of update-index
> > +   timestamps.
> > + */
> > +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> > +                             uint64_t max);
>
> This seems to be here because we want to write the file in a single
> pass, and the update index maximum and minimum appear in the header. If
> we were allowed to seek while writing, could the update index maximum
> and minimum be deduced instead?

No. The ref record update_index is delta encoded against the min
update_index, so you have to know it upfront.

If I could redesign the format, I'd leave out the max update_index
from the header (because it can be determined afterwards.), but it's
too late now.

>
> > +/* adds a reftable_ref_record. Must be called in ascending
> > +   order. The update_index must be within the limits set by
> > +   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
> > +
> > +   It is an error to write a ref record after a log record.
> > + */
> > +int reftable_writer_add_ref(struct reftable_writer *w,
> > +                         struct reftable_ref_record *ref);
> > +
> > +/* Convenience function to add multiple refs. Will sort the refs by
> > +   name before adding. */
> > +int reftable_writer_add_refs(struct reftable_writer *w,
> > +                          struct reftable_ref_record *refs, int n);
>
> Since refs is an array of objects and not an array of pointers, these
> two functions could be combined - the first is the same as calling the
> second with n=1.
>
> Also, ascending order of what?

Of record keys, ie. refname in case of ref records.

> The user will also need to know where to get the update index from - I
> presume that this will be the maximum update index of any record with
> the given refname + 1.

there is a reftable_stack_next_update_index() for that.

> > +/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
> > +   end of iteration.
> > +*/
> > +int reftable_iterator_next_ref(struct reftable_iterator *it,
> > +                            struct reftable_ref_record *ref);
> > +
> > +/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
> > +   end of iteration.
> > +*/
> > +int reftable_iterator_next_log(struct reftable_iterator *it,
> > +                            struct reftable_log_record *log);
>
> From my recollection, in Git, we typically use foreach functions with a
> callback that is invoked once for each result. I think that's preferable
> to this approach.

The iterator interface has the advantage that it captures reading both
individual entries (read_raw_ref) and iteration. It's also a natural
structure, because internally the iteration state has to be stored
(iterating merged tables stores the iterators in a priority queue.)

> [snip]
>
> > +/****************************************************************
> > + Merged tables
> > +
> > + A ref database kept in a sequence of table files. The merged_table presents a
> > + unified view to reading (seeking, iterating) a sequence of immutable tables.
> > + ****************************************************************/
> > +
> > +/* A merged table is implements seeking/iterating over a stack of tables. */
> > +struct reftable_merged_table;
> > +
> > +/* A generic reftable; see below. */
> > +struct reftable_table;
>
> Why would we need to see an individual reftable?

It could be useful diagnostics and troubleshooting. For example, if
you are debugging or trying to troubleshoot/fsck a repo, it might be
interesting to read a single file from .git/reftable/

Philosophically, since we have to expose the functionality to write a
single table, it seems reasonable to read single tables for symmetry
as well.

> Could we just represent
> a merged table as a virtual concatenation of blocks (as if all the
> blocks were in the same file)?

No. The format doesn't work like that.

> > +/* reftable_new_merged_table creates a new merged table. It takes ownership of
> > +   the stack array.
> > +*/
> > +int reftable_new_merged_table(struct reftable_merged_table **dest,
> > +                           struct reftable_table *stack, int n,
> > +                           uint32_t hash_id);
>
> I presume this would be used for things like compacting a few reftables
> together? In which case, I would expect this function to just take a
> list of filenames.

The merging of tables has nothing to do with the filesystem, and could
be done with reftables represented as in memory structure. This is
also how the unittests work.

> > +/* returns an iterator positioned just before 'name' */
> > +int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
> > +                                struct reftable_iterator *it,
> > +                                const char *name);
> > +
> > +/* returns an iterator for log entry, at given update_index */
> > +int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
> > +                                   struct reftable_iterator *it,
> > +                                   const char *name, uint64_t update_index);
> > +
> > +/* like reftable_merged_table_seek_log_at but look for the newest entry. */
> > +int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
> > +                                struct reftable_iterator *it,
> > +                                const char *name);
>
> Why do iterators need to be precisely positioned if this merged table is
> immutable?

I don't understand the question. How would you read a single ref from
a merged table without a seek function?

> > +/* convenience function to read a single ref. Returns < 0 for error, 0
> > +   for success, and 1 if ref not found. */
> > +int reftable_table_read_ref(struct reftable_table *tab, const char *name,
> > +                         struct reftable_ref_record *ref);
>
> I presume this returns the most up-to-date record in the (possibly
> merged) table? So we can just read the hash off the record to know what
> this ref points to.

correct.

> > +/****************************************************************
> > + Mutable ref database
> > +
> > + The stack presents an interface to a mutable sequence of reftables.
> > + ****************************************************************/
>
> So I would expect ref mutations to be done by opening the existing
> reftables as a merged table, open a reftable writer, write all the
> changes that need to be written, and update the tables.list file...

Roughly, yes, but there are other considerations: you have to check
for concurrent updates (by the time you finish reading tables.list, a
compaction may have deleted some of the tables referenced, so you have
to retry). If there are any failures, the transaction fails and you
have to clean up already written tables. If the write succeeds, but
the stack becomes unbalanced, you have to do a compaction. If you have
to reload data, you want to avoid closing and opening the same file.

If you are curious what the stack does, have a look at stack.c.

Also, this is for separation of logic. The merged table is composed of
reftables that have no particular type of storage, but support the
read interface. The stack is what ties the merged table to a
posix-like file system, and provides mutations.

> > +/* holds a transaction to add tables at the top of a stack. */
> > +struct reftable_addition;
> > +
> > +/*
> > +  returns a new transaction to add reftables to the given stack. As a side
> > +  effect, the ref database is locked.
> > +*/
> > +int reftable_stack_new_addition(struct reftable_addition **dest,
> > +                             struct reftable_stack *st);
> > +
> > +/* Adds a reftable to transaction. */
> > +int reftable_addition_add(struct reftable_addition *add,
> > +                       int (*write_table)(struct reftable_writer *wr,
> > +                                          void *arg),
> > +                       void *arg);
> > +
> > +/* Commits the transaction, releasing the lock. */
> > +int reftable_addition_commit(struct reftable_addition *add);
> > +
> > +/* Release all non-committed data from the transaction, and deallocate the
> > +   transaction. Releases the lock if held. */
> > +void reftable_addition_destroy(struct reftable_addition *add);
>
> We do need a transaction to write the new tables.list and then
> atomically update the repository with it, but I don't think we need one
> just to add a reftable to the stack.

see above.

> > +/* compacts all reftables into a giant table. Expire reflog entries if config is
> > + * non-NULL */
> > +int reftable_stack_compact_all(struct reftable_stack *st,
> > +                            struct reftable_log_expiry_config *config);
>
> Ah, the stack is used for compacting as well. I don't think compacting
> belongs here though - it should be its own thing that can make use of
> the merged-table reader and the single-table writer.

Why should it be like that, and where should compaction live?

> Some things that I think are missing:
>
>  - foreach all latest record of all refs (in any order) (I see some

Ref records are keyed by name, so the iteration produces each ref only once.

>    functions above that can set the position of the iterator, but I
>    don't think that the iterator skips over irrelevant refs?

What makes you think that?

> So we can't
>    use it for iterating over all refs if we don't care about the history
>    of refs)

I think you might be confusing the refs with reflogs here. The history
of a ref is kept in the reflog. This is why functions that want to
seek to a log record need an update_index timestamp.

>  - foreach all log entries of one ref (or all refs)

Before reviewing the API header here, I think it might be useful to
also look at the commit in the other series that uses the API to
implement the git ref backend. It shows how the foreach functions can
be implemented using the iterator interface.

> In summary, I would have expected a mechanism to read multiple (possibly
> one) reftable and perform queries on it, a mechanism to write a single
> reftable, and some functions to update tables.list.

I understand your expectation, but I think that expectation glosses
over important details of how it actually works.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado