Re: [PATCH 2/5] roaring.[ch]: apply Git specific changes to the roaring API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/19/2022 1:47 PM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx>
> 
> Though the Roaring library is introduced in previous commit, the library
> cannot be used as is. One reason is that the library doesn't support Big
> endian machines. Besides, Git specific file related functions does use
> `hashwrite()` (or similar). So there is a need to modify the library.

There are a few refactorings happening in this single patch, so it
might be good to split them out for easier spot-checking from the
reviewer's perspective. I'll try to list the ones I see.
 

>  int32_t array_container_write(const array_container_t *container, char *buf);
> +
> +int array_container_network_write(const array_container_t *container,
> +				  int (*write_fn) (void *, const void *, size_t),
> +				  void *data);

Should we make write_fn a defined type? I'm not sure I've seen this
implicit type within a function declaration before.

>  /**
>   * Reads the instance from buf, outputs how many bytes were read.
>   * This is meant to be byte-by-byte compatible with the Java and Go versions of
> @@ -1801,6 +1805,9 @@ int32_t array_container_write(const array_container_t *container, char *buf);
>  int32_t array_container_read(int32_t cardinality, array_container_t *container,
>                               const char *buf);
>  
> +int32_t array_container_network_read(int32_t cardinality, array_container_t *container,
> +                        	     const char *buf);
> +

Both of these functions are creating new implementations instead
of modifying the existing implementations. Is there any reason
why we should keep both of these in perpetuity? They are likely
to drift if we do that.

> +static int container_network_write(const container_t *c, uint8_t typecode,
> +				   int (*write_fn) (void *, const void *, size_t),
> +				   void *data)
> +{
> +	c = container_unwrap_shared(c, &typecode);
> +	switch (typecode) {
> +		case BITSET_CONTAINER_TYPE:
> +			return bitset_container_network_write(const_CAST_bitset(c), write_fn, data);
> +		case ARRAY_CONTAINER_TYPE:
> +			return array_container_network_write(const_CAST_array(c), write_fn, data);
> +		case RUN_CONTAINER_TYPE:
> +			return run_container_network_write(const_CAST_run(c), write_fn, data);
> +	}
> +	assert(false);
> +	__builtin_unreachable();
> +	return 0;
> +}
> +

This similarly is a copy of an existing function. Instead we
should probably make all writers/readers expect network byte
order (for all multi-word integers).

> +static size_t ra_portable_network_size_in_bytes(const roaring_array_t *ra)
> +{
> +	size_t count = ra_portable_network_header_size(ra);
> +
> +	for (int32_t k = 0; k < ra->size; ++k)

We have not loosened the restriction on defining iterator variables
within the for and instead would need this in the outer block. One
possible refactoring would be to move these definitions everywhere
within roaring.c.

> @@ -8603,16 +8981,16 @@ extern inline void roaring_bitmap_remove_range(roaring_bitmap_t *r, uint64_t min
>  void roaring_bitmap_printf(const roaring_bitmap_t *r) {
>      const roaring_array_t *ra = &r->high_low_container;
>  
> -    printf("{");
> +    fprintf(stderr, "{");
>      for (int i = 0; i < ra->size; ++i) {
>          container_printf_as_uint32_array(ra->containers[i], ra->typecodes[i],
>                                           ((uint32_t)ra->keys[i]) << 16);
>  
>          if (i + 1 < ra->size) {
> -            printf(",");
> +            fprintf(stderr, ",");
>          }
>      }
> -    printf("}");
> +    fprintf(stderr, "}");
>  }

This change is confusing to me. I epxect the printf() to print to
stdout, and this might be used in a test helper or something. If
you really want this to go somewhere other than stdout, then the
method should be changed to take an arbitrary FILE*.

> +void roaring_bitmap_free_safe(roaring_bitmap_t **r)
> +{
> +	if (*r) {
> +		roaring_bitmap_free((const roaring_bitmap_t *)*r);
> +		r = NULL;

I think you want "*r = NULL" here, if you are intending to free
and NULL the given address.

This method seems separate from the network-byte-order changes.

> +	}
> +}
> +
  
> +size_t roaring_bitmap_network_portable_size_in_bytes(const roaring_bitmap_t *r)
> +{
> +	return ra_portable_network_size_in_bytes(&r->high_low_container);
> +}

Does network order change the potential size of the bitmap?

> +roaring_bitmap_t *roaring_bitmap_portable_network_deserialize_safe(const char *buf, size_t maxbytes)
> +{
> +	roaring_bitmap_t *ans =
> +		(roaring_bitmap_t *)roaring_malloc(sizeof(roaring_bitmap_t));
> +	if (ans == NULL) {
> +		return NULL;
> +	}

nit: Lose braces around single-line blocks.

> +	size_t bytesread;
> +	bool is_ok = ra_portable_network_deserialize(&ans->high_low_container, buf, maxbytes, &bytesread);

Declare all variables before your logic. I think this will fail if
you run "make DEVELOPER=1".

> +	if(is_ok) assert(bytesread <= maxbytes);

nit: break lines for if bodies.

> +	roaring_bitmap_set_copy_on_write(ans, false);
> +	if (!is_ok) {
> +		roaring_free(ans);
> +		return NULL;
> +	}
> +	return ans;
> +}
> +

>  size_t roaring_bitmap_portable_serialize(const roaring_bitmap_t *r,
>                                           char *buf) {
>      return ra_portable_serialize(&r->high_low_container, buf);
>  }
>  
> +int roaring_bitmap_portable_network_serialize(roaring_bitmap_t *rb,
> +				     int (*write_fn) (void *, const void *, size_t),
> +				     void *data)
> +{
> +	return ra_portable_network_serialize(&rb->high_low_container, write_fn, data);
> +}

I'm not sure why these methods are created as wrappers instead of
renaming the base methods.


>  roaring_bitmap_t *roaring_bitmap_deserialize(const void *buf) {
>      const char *bufaschar = (const char *)buf;
>      if (*(const unsigned char *)buf == CROARING_SERIALIZATION_ARRAY_UINT32) {
> @@ -13827,9 +14247,9 @@ void array_container_printf_as_uint32_array(const array_container_t *v,
>      if (v->cardinality == 0) {
>          return;
>      }
> -    printf("%u", v->array[0] + base);
> +    fprintf(stderr, "%u", v->array[0] + base);
>      for (int i = 1; i < v->cardinality; ++i) {
> -        printf(",%u", v->array[i] + base);
> +        fprintf(stderr, ",%u", v->array[i] + base);

Here's another printf to fprintf situation that is unclear to me.

> @@ -15208,13 +15659,13 @@ void run_container_printf_as_uint32_array(const run_container_t *cont,
>      {
>          uint32_t run_start = base + cont->runs[0].value;
>          uint16_t le = cont->runs[0].length;
> -        printf("%u", run_start);
> -        for (uint32_t j = 1; j <= le; ++j) printf(",%u", run_start + j);
> +        fprintf(stderr, "%u", run_start);
> +        for (uint32_t j = 1; j <= le; ++j) fprintf(stderr, ",%u", run_start + j);

Ditto here. I see we are inheriting off-style code from the original.

> +/**
> + * Frees the memory if exists
> + */
> +void roaring_bitmap_free_safe(roaring_bitmap_t **r);

And nullifies the pointer, don't forget!

In general, I think this change would be a lot smaller if you took
the existing implementation and inserted the proper ntohl() and
htonl() conversions. Git will never call the other versions, so
why keep them in the tree? Why require re-checking all of the format
logic here instead of only the places where we write multi-byte
words?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux