On Tue, Mar 26, 2019 at 02:52:33PM -0700, Vijay Bellur wrote: > On Thu, Mar 21, 2019 at 8:44 AM Yaniv Kaul <ykaul@xxxxxxxxxx> wrote: > > > > > > > On Thu, Mar 21, 2019 at 5:23 PM Nithya Balachandran <nbalacha@xxxxxxxxxx> > > wrote: > > > >> > >> > >> On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote: > >> > >>> All, > >>> > >>> In the last few releases of glusterfs, with stability as a primary theme > >>> of the releases, there has been lots of changes done on the code > >>> optimization with an expectation that such changes will have gluster to > >>> provide better performance. While many of these changes do help, but off > >>> late we have started seeing some diverse effects of them, one especially > >>> being the calloc to malloc conversions. While I do understand that malloc > >>> syscall will eliminate the extra memset bottleneck which calloc bears, but > >>> with recent kernels having in-built strong compiler optimizations I am not > >>> sure whether that makes any significant difference, but as I mentioned > >>> earlier certainly if this isn't done carefully it can potentially introduce > >>> lot of bugs and I'm writing this email to share one of such experiences. > >>> > >>> Sanju & I were having troubles for last two days to figure out why > >>> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in > >>> Sanju's system but it had no problems running the same fix in my gluster > >>> containers. After spending a significant amount of time, what we now > >>> figured out is that a malloc call [1] (which was a calloc earlier) is the > >>> culprit here. As you all can see, in this function we allocate txn_id and > >>> copy the event->txn_id into it through gf_uuid_copy () . But when we were > >>> debugging this step wise through gdb, txn_id wasn't exactly copied with the > >>> exact event->txn_id and it had some junk values which made the > >>> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on > >>> resulting the leaks to remain the same which was the original intention of > >>> the fix. > >>> > >>> This was quite painful to debug and we had to spend some time to figure > >>> this out. Considering we have converted many such calls in past, I'd urge > >>> that we review all such conversions and see if there're any side effects to > >>> it. Otherwise we might end up running into many potential memory related > >>> bugs later on. OTOH, going forward I'd request every patch > >>> owners/maintainers to pay some special attention to these conversions and > >>> see they are really beneficial and error free. IMO, general guideline > >>> should be - for bigger buffers, malloc would make better sense but has to > >>> be done carefully, for smaller size, we stick to calloc. > >>> > >> > >>> What do others think about it? > >>> > >> > >> I believe that replacing calloc with malloc everywhere without adequate > >> testing and review is not safe and am against doing so for the following > >> reasons: > >> > > > > No patch should get in without adequate testing and thorough review. > > > > > There are lots of interesting points to glean in this thread. However, this > particular one caught my attention. How about we introduce a policy that no > patch gets merged unless it is thoroughly tested? The onus would be on the > developer to provide a .t test case to show completeness in the testing of > that patch. If the developer does not or cannot for any reason, we could > have the maintainer run tests and add a note in gerrit explaining the tests > run. This would provide more assurance about the patches being tested > before getting merged. Obviously, patches that fix typos or that cannot > affect any functionality need not be subject to this policy. > > As far as review thoroughness is concerned, it might be better to mandate > acks from respective maintainers before merging a patch that affects > several components. More eyeballs that specialize in particular > component(s) will hopefully catch some of these issues during the review > phase. Both of these points have always been strongly encouraged. They are also documented in the https://docs.gluster.org/en/latest/Contributors-Guide/Guidelines-For-Maintainers/ https://github.com/gluster/glusterdocs/blob/master/docs/Contributors-Guide/Guidelines-For-Maintainers.md (formatting is broken in the 1st link, but I dont know how to fix it) We probably need to apply our own guidelines a little better, and remember developers that > 90% of the patch(series) should come with a .t file or added test in an existing one. And a big +1 for getting reviews or at least some involvement of the component maintainers. Niels _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel