Re: [PATCH 3/5] roaring: teach Git to write roaring bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/30/2022 2:35 AM, Abhradeep Chakraborty wrote:
> Hello all,
> 
> It has been a month since I didn't get involved in any open source
> contributions (including Git). This is due to the fact that I was
> focusing more on mastering theories and also that it was a festive
> month. So, I am now resuming my work. There are many things I have to
> cover (including this patch series).
> But before that I want to ask you a question - As you have noticed
> already, the Roaring library has a lot of styling issues (Moreover it
> is using C11). So Should I fix all these issues? or Should I make a
> new library (using Git's compatibility library "git-compat-util.h") by
> taking CRoaring as a reference? The pros are that it would be easier
> to format the bitmap library specific files and it can use Git
> compatible functions.
> 
> I would love to hear your opinions. Thanks :)

I HAVE OPINIONS! :D

Mostly, there are two things I'd like for you to keep in mind:

1. Using the library as-is is a great way to prototype and dig in on
   the performance measurement side. Can you construct or clone enough
   interesting repositories to get a feeling of the effect of the
   roaring format compared to the EWAH format? If there is no benefit
   to switching, then we can save everyone a lot of work by marking
   that as an incorrect road. However, if there is sufficient evidence
   that it's working well, then we have established a baseline that
   the full implementation should match (at least, if not do better).

2. Once deciding to do the work, we can think about the reasons to use
   the existing library over writing our own. The most basic reason is
   that the library is extensively tested, so we gain all of those
   benefits. Can we incorporate their test suite into our own? The
   next main benefit is that we can take any changes from their version
   into our code with minimal fuss. How often do you think that they
   have bug fixes or enhancements in the repo? How would those changes
   translate into our mailing list workflow? If we restyled the library,
   then we are unlikely to get easy benefits from taking upstream
   changes, but we could recreate them with manual effort.

3. After carefully considering the benefits/drawbacks of using the
   existing library, consider the same for writing one from scratch.
   The most important thing I will say here is that the core idea is
   rather simple. There may even be ways that we can take advantage
   of the format and its data structures with the expectations we have
   in Git repositories that are not always possible for generic
   databases. We should be able to build a much smaller library that's
   limited to our needs and customized to our use case. However, we
   would need to test it carefully, both for correctness and for
   performance, and that is not a small undertaking.

Hopefully this gives you something to chew on. Investigating each of
these directions should help you come to a conclusion that you can
bring to the community as the expert, then we can examine your
findings to see if we agree.

Remember that code speaks. If you're willing to build it one way,
then that concrete implementation is already worth more than a
hypothetical alternative in many regards. That can be a starting
point to move forward.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux