On 10/30/2022 2:35 AM, Abhradeep Chakraborty wrote: > Hello all, > > It has been a month since I didn't get involved in any open source > contributions (including Git). This is due to the fact that I was > focusing more on mastering theories and also that it was a festive > month. So, I am now resuming my work. There are many things I have to > cover (including this patch series). > But before that I want to ask you a question - As you have noticed > already, the Roaring library has a lot of styling issues (Moreover it > is using C11). So Should I fix all these issues? or Should I make a > new library (using Git's compatibility library "git-compat-util.h") by > taking CRoaring as a reference? The pros are that it would be easier > to format the bitmap library specific files and it can use Git > compatible functions. > > I would love to hear your opinions. Thanks :) I HAVE OPINIONS! :D Mostly, there are two things I'd like for you to keep in mind: 1. Using the library as-is is a great way to prototype and dig in on the performance measurement side. Can you construct or clone enough interesting repositories to get a feeling of the effect of the roaring format compared to the EWAH format? If there is no benefit to switching, then we can save everyone a lot of work by marking that as an incorrect road. However, if there is sufficient evidence that it's working well, then we have established a baseline that the full implementation should match (at least, if not do better). 2. Once deciding to do the work, we can think about the reasons to use the existing library over writing our own. The most basic reason is that the library is extensively tested, so we gain all of those benefits. Can we incorporate their test suite into our own? The next main benefit is that we can take any changes from their version into our code with minimal fuss. How often do you think that they have bug fixes or enhancements in the repo? How would those changes translate into our mailing list workflow? If we restyled the library, then we are unlikely to get easy benefits from taking upstream changes, but we could recreate them with manual effort. 3. After carefully considering the benefits/drawbacks of using the existing library, consider the same for writing one from scratch. The most important thing I will say here is that the core idea is rather simple. There may even be ways that we can take advantage of the format and its data structures with the expectations we have in Git repositories that are not always possible for generic databases. We should be able to build a much smaller library that's limited to our needs and customized to our use case. However, we would need to test it carefully, both for correctness and for performance, and that is not a small undertaking. Hopefully this gives you something to chew on. Investigating each of these directions should help you come to a conclusion that you can bring to the community as the expert, then we can examine your findings to see if we agree. Remember that code speaks. If you're willing to build it one way, then that concrete implementation is already worth more than a hypothetical alternative in many regards. That can be a starting point to move forward. Thanks, -Stolee