Han-Wen Nienhuys wrote: > On Fri, Oct 2, 2020 at 5:58 AM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: >> Adding a header in a separate patch from the implementation doesn't >> match the usual practice. Can we add declarations in the same patch >> as the functions being declared instead? > > We could, but it would considerably complicate work on this patch > series, as the commit boundary then doesn't fall on file boundaries > anymore. Would you be open to having multiple headers for the public > interface? eg. reftable-record.h, reftable-reader.h etc. ? I'd be more than open to it: I think that would make for a clearer API. For example, a caller like refs.c wouldn't have any need to use reftable-block.h because that's a lower-level detail that operates behind the scenes when operating on a stack of reftables. So I think this would be a good step toward addressing the feedback Jonathan gave about the set of functions exposed by the API being overwhelming. >>> + /* Misuse of the API: >>> + - on writing a record with NULL refname. >>> + - on writing a reftable_ref_record outside the table limits >>> + - on writing a ref or log record before the stack's next_update_index >>> + - on writing a log record with multiline message with >>> + exact_log_message unset >>> + - on reading a reftable_ref_record from log iterator, or vice versa. >>> + */ >>> + REFTABLE_API_ERROR = -6, >> >> Should these call BUG()? Or is it useful for some callers to be able >> to recover from these errors? > > Since this was written as a standalone library, it's up to the caller > to decide what should be done. I'm not strongly opinionated about this, but just a quick note: this implies that the library would need to make sure it is producing a valid state in error cases. Does the API documentation describe what state a handle is in after an error --- e.g., what operations are permitted after that? >>> + >>> +/* >>> + * Convert the numeric error code to an equivalent errno code. >>> + */ >>> +int reftable_error_to_errno(int err); >> >> What is the intended use of this function? > > The read_raw_ref method in the ref backend API uses errno values as > out-of-band communication mechanism. Could we change that? It sounds error-prone. [...] >>> +/* Set the range of update indices for the records we will add. When >>> + writing a table into a stack, the min should be at least >>> + reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned. >>> + >>> + For transactional updates, typically min==max. When converting an existing >>> + ref database into a single reftable, this would be a range of update-index >>> + timestamps. >>> + */ >>> +void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >>> + uint64_t max); >> >> What happens if I write to a reftable_writer without setting limits >> first? > > you get an REFTABLE_API_ERROR. (To be clear, with all of these questions, I am asking not only for my own curiosity but so that the documentation, naming, etc can be cleared up to help the next person who wonders the same thing.) [...] >> Do I pass in the 'struct block_source *' as the source arg? If so, why >> are these declared as void *? > > you pass in block_source->arg. Should struct implementations of > polymorphic types carry a pointer to their own vtable instead? I think a pointer to block_source would make it more self-explanatory, yes. (Aside: this is a difference between how Go and e.g. Java and C++ implement polymorphism. In Go, an interface object contains a vtable and a pointer to the object. In Java and C++, the object contains a vptr.) >> Is the reason this manages the buffer instead of requiring a >> caller-supplied buffer to support zero-copy? > > Log blocks are compressed, so the caller doesn't know the correct size > to supply. By letting the block source handle the management, we can > swap out the block read from the file for a block managed by malloc on > decompressing a log block. Hm, my naive assumption would have been that we'd use different buffers for the compressed and uncompressed data. >>> + Generic tables >>> + >>> + A unified API for reading tables, either merged tables, or single readers. >> >> Are there callers/helpers that don't know whether they want one or the >> other? > > The setup with per-worktree refs means that there are two reftable > stacks in a .git repo. In order to iterate over the entire ref space, > you have to merge two merged tables, but the stack itslef merges a set > of simple reftables. I see. That's subtle, so it seems worth documenting for the next person reading this documentation and wondering why. Thanks, Jonathan