On 9/19/2022 1:47 PM, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> > > Though the Roaring library is introduced in previous commit, the library > cannot be used as is. One reason is that the library doesn't support Big > endian machines. Besides, Git specific file related functions does use > `hashwrite()` (or similar). So there is a need to modify the library. There are a few refactorings happening in this single patch, so it might be good to split them out for easier spot-checking from the reviewer's perspective. I'll try to list the ones I see. > int32_t array_container_write(const array_container_t *container, char *buf); > + > +int array_container_network_write(const array_container_t *container, > + int (*write_fn) (void *, const void *, size_t), > + void *data); Should we make write_fn a defined type? I'm not sure I've seen this implicit type within a function declaration before. > /** > * Reads the instance from buf, outputs how many bytes were read. > * This is meant to be byte-by-byte compatible with the Java and Go versions of > @@ -1801,6 +1805,9 @@ int32_t array_container_write(const array_container_t *container, char *buf); > int32_t array_container_read(int32_t cardinality, array_container_t *container, > const char *buf); > > +int32_t array_container_network_read(int32_t cardinality, array_container_t *container, > + const char *buf); > + Both of these functions are creating new implementations instead of modifying the existing implementations. Is there any reason why we should keep both of these in perpetuity? They are likely to drift if we do that. > +static int container_network_write(const container_t *c, uint8_t typecode, > + int (*write_fn) (void *, const void *, size_t), > + void *data) > +{ > + c = container_unwrap_shared(c, &typecode); > + switch (typecode) { > + case BITSET_CONTAINER_TYPE: > + return bitset_container_network_write(const_CAST_bitset(c), write_fn, data); > + case ARRAY_CONTAINER_TYPE: > + return array_container_network_write(const_CAST_array(c), write_fn, data); > + case RUN_CONTAINER_TYPE: > + return run_container_network_write(const_CAST_run(c), write_fn, data); > + } > + assert(false); > + __builtin_unreachable(); > + return 0; > +} > + This similarly is a copy of an existing function. Instead we should probably make all writers/readers expect network byte order (for all multi-word integers). > +static size_t ra_portable_network_size_in_bytes(const roaring_array_t *ra) > +{ > + size_t count = ra_portable_network_header_size(ra); > + > + for (int32_t k = 0; k < ra->size; ++k) We have not loosened the restriction on defining iterator variables within the for and instead would need this in the outer block. One possible refactoring would be to move these definitions everywhere within roaring.c. > @@ -8603,16 +8981,16 @@ extern inline void roaring_bitmap_remove_range(roaring_bitmap_t *r, uint64_t min > void roaring_bitmap_printf(const roaring_bitmap_t *r) { > const roaring_array_t *ra = &r->high_low_container; > > - printf("{"); > + fprintf(stderr, "{"); > for (int i = 0; i < ra->size; ++i) { > container_printf_as_uint32_array(ra->containers[i], ra->typecodes[i], > ((uint32_t)ra->keys[i]) << 16); > > if (i + 1 < ra->size) { > - printf(","); > + fprintf(stderr, ","); > } > } > - printf("}"); > + fprintf(stderr, "}"); > } This change is confusing to me. I epxect the printf() to print to stdout, and this might be used in a test helper or something. If you really want this to go somewhere other than stdout, then the method should be changed to take an arbitrary FILE*. > +void roaring_bitmap_free_safe(roaring_bitmap_t **r) > +{ > + if (*r) { > + roaring_bitmap_free((const roaring_bitmap_t *)*r); > + r = NULL; I think you want "*r = NULL" here, if you are intending to free and NULL the given address. This method seems separate from the network-byte-order changes. > + } > +} > + > +size_t roaring_bitmap_network_portable_size_in_bytes(const roaring_bitmap_t *r) > +{ > + return ra_portable_network_size_in_bytes(&r->high_low_container); > +} Does network order change the potential size of the bitmap? > +roaring_bitmap_t *roaring_bitmap_portable_network_deserialize_safe(const char *buf, size_t maxbytes) > +{ > + roaring_bitmap_t *ans = > + (roaring_bitmap_t *)roaring_malloc(sizeof(roaring_bitmap_t)); > + if (ans == NULL) { > + return NULL; > + } nit: Lose braces around single-line blocks. > + size_t bytesread; > + bool is_ok = ra_portable_network_deserialize(&ans->high_low_container, buf, maxbytes, &bytesread); Declare all variables before your logic. I think this will fail if you run "make DEVELOPER=1". > + if(is_ok) assert(bytesread <= maxbytes); nit: break lines for if bodies. > + roaring_bitmap_set_copy_on_write(ans, false); > + if (!is_ok) { > + roaring_free(ans); > + return NULL; > + } > + return ans; > +} > + > size_t roaring_bitmap_portable_serialize(const roaring_bitmap_t *r, > char *buf) { > return ra_portable_serialize(&r->high_low_container, buf); > } > > +int roaring_bitmap_portable_network_serialize(roaring_bitmap_t *rb, > + int (*write_fn) (void *, const void *, size_t), > + void *data) > +{ > + return ra_portable_network_serialize(&rb->high_low_container, write_fn, data); > +} I'm not sure why these methods are created as wrappers instead of renaming the base methods. > roaring_bitmap_t *roaring_bitmap_deserialize(const void *buf) { > const char *bufaschar = (const char *)buf; > if (*(const unsigned char *)buf == CROARING_SERIALIZATION_ARRAY_UINT32) { > @@ -13827,9 +14247,9 @@ void array_container_printf_as_uint32_array(const array_container_t *v, > if (v->cardinality == 0) { > return; > } > - printf("%u", v->array[0] + base); > + fprintf(stderr, "%u", v->array[0] + base); > for (int i = 1; i < v->cardinality; ++i) { > - printf(",%u", v->array[i] + base); > + fprintf(stderr, ",%u", v->array[i] + base); Here's another printf to fprintf situation that is unclear to me. > @@ -15208,13 +15659,13 @@ void run_container_printf_as_uint32_array(const run_container_t *cont, > { > uint32_t run_start = base + cont->runs[0].value; > uint16_t le = cont->runs[0].length; > - printf("%u", run_start); > - for (uint32_t j = 1; j <= le; ++j) printf(",%u", run_start + j); > + fprintf(stderr, "%u", run_start); > + for (uint32_t j = 1; j <= le; ++j) fprintf(stderr, ",%u", run_start + j); Ditto here. I see we are inheriting off-style code from the original. > +/** > + * Frees the memory if exists > + */ > +void roaring_bitmap_free_safe(roaring_bitmap_t **r); And nullifies the pointer, don't forget! In general, I think this change would be a lot smaller if you took the existing implementation and inserted the proper ntohl() and htonl() conversions. Git will never call the other versions, so why keep them in the tree? Why require re-checking all of the format logic here instead of only the places where we write multi-byte words? Thanks, -Stolee