Patrick Steinhardt <ps@xxxxxx> writes: > On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >> diff --git a/reftable/writer.c b/reftable/writer.c >> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >> --- a/reftable/writer.c >> +++ b/reftable/writer.c >> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> uint64_t max) >> { >> + /* >> + * The limits shouldn't be modified post writing the first block, else >> + * it would cause a mismatch between the header and the footer. >> + */ > > Can we make this *even* stricter? I think that this is something that is > easy to do wrong, and the fact that it only triggers in some situations > of misuse may easily make tests miss this issue. So ideally, we should > assert that `set_limits()` is always called before queueing any records > to the writer. This would make us error out in all situations where the > calling order is wrong. > I agree here, it makes sense to make this stricter. Like you mentioned, currently they are independent. The only way to enforce the limits is to ensure that they are dependent. > There are two ways I can see us doing that: > > - Detect any state written by `writer_add_record()` and error out if > it's set when `reftable_writer_set_limits()` is called. > Yeah I think this would be simple to do. I guess we can check `w->last_key` is set, since any record write would modify that. > - Adapt `reftable_writer_new()` so that it takes the update indices as > input and drop `reftable_writer_set_limits()` altogether. > This one is a bit harder to do because of our flow. Generally the writer is provided to callers via a callback function passed to `reftable_addition_add()`. I guess I could simply pass the data: caller -> reftable_addition_add() -> reftable_writer_new() Any direct users of `reftable_writer_new()` would simply pass the data directly. I'll play around and see if this is doable without too much refactoring and have something in the next version. > The latter might be preferable as you basically want to set limits in > all (most?) situations anyway. > >> + if (w->next) >> + BUG("update index modified after writing first block"); > > Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It > requires a bit more plumbing because we'll also hvae to adapt all > callers to handle errors. But on the one hand we don't want to die in > library code. And on the other hand we don't want to keep on adding more > dependencies on "git-compat-util.h". > Fair enough, thanks for explaining. > Patrick
Attachment:
signature.asc
Description: PGP signature