Re: [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Mar 21, 2019 at 8:44 AM Yaniv Kaul <ykaul@xxxxxxxxxx> wrote:


On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:


On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
All,

In the last few releases of glusterfs, with stability as a primary theme of the releases, there has been lots of changes done on the code optimization with an expectation that such changes will have gluster to provide better performance. While many of these changes do help, but off late we have started seeing some diverse effects of them, one especially being the calloc to malloc conversions. While I do understand that malloc syscall will eliminate the extra memset bottleneck which calloc bears, but with recent kernels having in-built strong compiler optimizations I am not sure whether that makes any significant difference, but as I mentioned earlier certainly if this isn't done carefully it can potentially introduce lot of bugs and I'm writing this email to share one of such experiences.

Sanju & I were having troubles for last two days to figure out why https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in Sanju's system but it had no problems running the same fix in my gluster containers. After spending a significant amount of time, what we now figured out is that a malloc call [1] (which was a calloc earlier) is the culprit here. As you all can see, in this function we allocate txn_id and copy the event->txn_id into it through gf_uuid_copy () . But when we were debugging this step wise through gdb, txn_id wasn't exactly copied with the exact event->txn_id and it had some junk values which made the glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on resulting the leaks to remain the same which was the original intention of the fix.

This was quite painful to debug and we had to spend some time to figure this out. Considering we have converted many such calls in past, I'd urge that we review all such conversions and see if there're any side effects to it. Otherwise we might end up running into many potential memory related bugs later on. OTOH, going forward I'd request every patch owners/maintainers to pay some special attention to these conversions and see they are really beneficial and error free. IMO, general guideline should be - for bigger buffers, malloc would make better sense but has to be done carefully, for smaller size, we stick to calloc.

What do others think about it?

I believe that replacing calloc with malloc everywhere without adequate testing and review is not safe and am against doing so for the following reasons:

No patch should get in without adequate testing and thorough review.


There are lots of interesting points to glean in this thread. However, this particular one caught my attention. How about we introduce a policy that no patch gets merged unless it is thoroughly tested? The onus would be on the developer to provide a .t test case to show completeness in the testing of that patch. If the developer does not or cannot for any reason, we could have the maintainer run tests and add a note in gerrit explaining the tests run. This would provide more assurance about the patches being tested before getting merged. Obviously, patches that fix typos or that cannot affect any functionality need not be subject to this policy.

As far as review thoroughness is concerned, it might be better to mandate acks from respective maintainers before merging a patch that affects several components. More eyeballs that specialize in particular component(s) will hopefully catch some of these issues during the review phase.

Thanks,
Vijay
 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux