Karthik Nayak <karthik.188@xxxxxxxxx> writes: > Patrick Steinhardt <ps@xxxxxx> writes: > >> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >>> diff --git a/reftable/writer.c b/reftable/writer.c >>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >>> --- a/reftable/writer.c >>> +++ b/reftable/writer.c >>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >>> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >>> uint64_t max) >>> { >>> + /* >>> + * The limits shouldn't be modified post writing the first block, else >>> + * it would cause a mismatch between the header and the footer. >>> + */ >> >> Can we make this *even* stricter? I think that this is something that is >> easy to do wrong, and the fact that it only triggers in some situations >> of misuse may easily make tests miss this issue. So ideally, we should >> assert that `set_limits()` is always called before queueing any records >> to the writer. This would make us error out in all situations where the >> calling order is wrong. >> > > I agree here, it makes sense to make this stricter. Like you mentioned, > currently they are independent. The only way to enforce the limits is to > ensure that they are dependent. > >> There are two ways I can see us doing that: >> >> - Detect any state written by `writer_add_record()` and error out if >> it's set when `reftable_writer_set_limits()` is called. >> > > Yeah I think this would be simple to do. I guess we can check > `w->last_key` is set, since any record write would modify that. > >> - Adapt `reftable_writer_new()` so that it takes the update indices as >> input and drop `reftable_writer_set_limits()` altogether. >> > > This one is a bit harder to do because of our flow. Generally the writer > is provided to callers via a callback function passed to > `reftable_addition_add()`. I guess I could simply pass the data: > > caller -> reftable_addition_add() -> reftable_writer_new() > > Any direct users of `reftable_writer_new()` would simply pass the data > directly. > > I'll play around and see if this is doable without too much refactoring > and have something in the next version. It seems like `reftable_addition_add()` is internally and externally called by a lot of functions and as such, modifying them all would be tedious. Another idea is to modify the function pointer that `reftable_addition_add()` receives so that the `write_table` argument receives a function which would set the limits, but this seems like bad design to me. So the best/easiest way to do this is to error out if any state is set before calling `reftable_writer_set_limits()`. I also realized that if `reftable_writer_set_limits()` isn't called, the writer errors anyways. So just ensuring that `reftable_writer_set_limits()` checks for any state being set would work well. [snip]
Attachment:
signature.asc
Description: PGP signature