On Tue, Apr 28, 2020 at 4:55 PM Danh Doan <congdanhqx@xxxxxxxxx> wrote: > > +#include "system.h" > > + > > +void put_be24(byte *out, uint32_t i) > > So, we introduce a new type `byte`? The reftable is structured as a library. The declaration for 'byte', as well as 'bool' is private to the library. > > +{ > > + out[0] = (byte)((i >> 16) & 0xff); > > + out[1] = (byte)((i >> 8) & 0xff); > > + out[2] = (byte)((i)&0xff); > > At my first glance, I thought that code is: > > out[2] = (byte)( (i)(&0xff) ) > > It's totally un-parsable. I removed the parens here, in https://github.com/google/reftable/commit/6f883714588d2658122fefd85a2e2e76401ae2be that also changes a number of 'int's to 'size_t's. > > + return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 | > > + (uint32_t)(in[2]); > > I don't think we need this much cast. One cast can rule them all. would you have a concrete suggestion to rewrite this? The rules for undefined behavior and integer casts are subtle, and I know they are a cause of concern in crypto code, and I'd rather be correct than rely on deep knowledge of the C standard. > > +int binsearch(int sz, int (*f)(int k, void *args), void *args) > > binary search? > > I think we want all size to be `size_t` fixed this. > > > +{ > > + int lo = 0; > > + int hi = sz; > > + > > + /* invariant: (hi == sz) || f(hi) == true > > + (lo == 0 && f(0) == true) || fi(lo) == false > > + */ > > Comment style Review style > > diff --git a/reftable/basics.h b/reftable/basics.h > > new file mode 100644 > > index 00000000000..6d89eb5d931 > > --- /dev/null > > +++ b/reftable/basics.h > > @@ -0,0 +1,53 @@ > [..snip..] > > + find smallest index i in [0, sz) at which f(i) is true, assuming > > + that f is ascending. Return sz if f(i) is false for all indices. > > +*/ > > So, this is about partitioning, not searching there is precedent for this to be called binary search. See for example, https://golang.org/pkg/sort/#Search > > +int block_writer_add(struct block_writer *w, struct record rec) > > So, we're gonna pass a large structure into a function now? The record structure contains two pointers. > > + if (block_writer_register_restart(w, start.len - out.len, restart, > > + key) < 0) { > > Eh, naming is hard, and so does this function. I can't parse this sentence. > It's overly verbose, > and long. I haven't go though all of this patch (because it's too > long), but I guess some of them can be made static and their name can > be shortened. For uniformity all functions taking a struct Xyz* as the first argument are called xyz_do_something since this takes a block_writer, and it registers a restart, it is called block_writer_register_restart. The naming of this is uniform throughout the library. > > + block_source_return_block(block->source, block); > > + block->data = uncompressed.buf; > > + block->len = dst_len; /* XXX: 4 bytes missing? */ > > Even this is a very lengthy patch, some bytes is missing but we > couldn't figured out. good point, I'll look into this again. > > + { > > This cute curly braces is new to me Pleased to make your acquaintance! > > +static int restart_key_less(int idx, void *args) > > idx? index? Shouldn't it be size_t? fixed this. > > > +{ > > + struct restart_find_args *a = (struct restart_find_args *)args; > > + uint32_t off = block_reader_restart_offset(a->r, idx); > > + struct slice in = { > > + .buf = a->r->block.data + off, > > + .len = a->r->block_len - off, > > C99 designated initialisation. > We aren't ready, yet. I'm sorry, gcc -std=c89 said this was OK. > > +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src) > > +{ > > + dest->br = src->br; > > + dest->next_off = src->next_off; > > + slice_copy(&dest->last_key, src->last_key); > > +} > > + > > +/* return < 0 for error, 0 for OK, > 0 for EOF. */ > > +int block_iter_next(struct block_iter *it, struct record rec) > > If this function is used only in this file, it should be static, > otherwise, the comment should go to header file moved comment to header file. > > diff --git a/reftable/block.h b/reftable/block.h > > new file mode 100644 > > index 00000000000..62b2e0fec6d > > +/* > > + Writes reftable blocks. The block_writer is reused across blocks to minimize > > + allocation overhead. > > +*/ > > +struct block_writer { > > + byte *buf; > > + uint32_t block_size; > > I suppose this means 2^32 block should be enough for everyone? No. 2^24-1 is the maximum size, as detailed in the specification. > > diff --git a/reftable/bytes.c b/reftable/bytes.c > > new file mode 100644 > > index 00000000000..e69de29bb2d > > Empty? thanks, I've removed these. > > diff --git a/reftable/config.h b/reftable/config.h > > new file mode 100644 > > index 00000000000..40a8c178f10 > > --- /dev/null > > +++ b/reftable/config.h > > @@ -0,0 +1 @@ > > +/* empty */ > > Empty? Removed > > +++ b/reftable/dump.c This was moved to a different directory in upstream; I've removed it now. > > +bool iterator_is_null(struct reftable_iterator it) > > +{ > > + return it.ops == NULL; > > +} > > Do we want this verbose? > Or it'll be use in some kind of vtable? I've added a doc comment. > > + > > +static int filtering_ref_iterator_next(void *iter_arg, struct record rec) > > +{ > > + struct filtering_ref_iterator *fri = > > + (struct filtering_ref_iterator *)iter_arg; > > + struct reftable_ref_record *ref = > > + (struct reftable_ref_record *)rec.data; > > We're passing a pointer, references a member of a variable in local > scope around, this will ask for trouble if callee doesn't aware about > it, and save that pointer somewhere for later access. I don't understand your criticism; could you be more precise? -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado