Hi Pranith, many thanks. Indeed, I have recompiled the glfsheal executable using the changes in your patch set, but without commenting the first entry of glfs_fini (the one marked as "Don't we need to comment this too?") indeed it still segfaults. After commenting out that one too it seems to run fine. For the moment I can use this patched executable, until you fix it in a release. Many thanks! Alessandro On Fri, 2015-05-29 at 16:03 +0530, Pranith Kumar Karampuri wrote: > > > On 05/29/2015 03:36 PM, Alessandro De Salvo wrote: > > > Hi Pranith, > > thanks to you! 2-3 days are fine, don’t worry. However, if you can > > give me the details of the compilation of glsheal you are > > mentioning, we could have a quick check if everything’s fine with > > the fix, before you release. So just let me know what you prefer. > > For me waiting 2-3 days is not a problem though, as it is not a > > critical server and I could even recreate the volumes. > > We recently introduced code path which frees up memory in long > standing processes. Seems like this is not tested when file-snapshots > feature is on. If that option is disabled the crash won't happen. > "gluster volume heal <volname> info" Uses the same api. But > fortunately this "glfsheal" process will die as soon as heal info > output is gathered. So no need to call this freeing of memory just > before dying. For now we enabled this code path (patch: > http://review.gluster.org/11001) only for internal builds but not in > released versions while we stabilize that part of the code. You can > take this patch for patching glfsheal. > > Pranith > > Thanks again, > > > > > > Alessandro > > > > > Il giorno 29/mag/2015, alle ore 11:54, Pranith Kumar Karampuri > > > <pkarampu@xxxxxxxxxx> ha scritto: > > > > > > > > > > > > On 05/29/2015 03:16 PM, Alessandro De Salvo wrote: > > > > > > > Hi Pranith, > > > > I’m definitely sure the log is correct, but you are also correct > > > > when you say there is no sign of crash (even checking with grep! > > > > ). > > > > However I see core dumps (e.g. core.19430) in /var/log/gluster) > > > > created every time I issue the heal info command. > > > > From gdb I see this: > > > Thanks for providing the information Alessandro. We will fix this > > > issue. I am wondering how we can unblock you in the interim. There > > > is a plan to release 3.7.1 in 2-3 days I think. I can try to make > > > this fix for that release. Let me know if you can wait that long? > > > Another possibility is to compile just glfsheal binary with the > > > fix which "gluster volume heal <volname> info" internally. Let me > > > know. > > > > > > Pranith. > > > > > > > > > > > > > > > > > > > > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7 > > > > Copyright (C) 2013 Free Software Foundation, Inc. > > > > License GPLv3+: GNU GPL version 3 or later > > > > <http://gnu.org/licenses/gpl.html> > > > > This is free software: you are free to change and redistribute > > > > it. > > > > There is NO WARRANTY, to the extent permitted by law. Type > > > > "show copying" > > > > and "show warranty" for details. > > > > This GDB was configured as "x86_64-redhat-linux-gnu". > > > > For bug reporting instructions, please see: > > > > <http://www.gnu.org/software/gdb/bugs/>... > > > > Reading symbols from /usr/sbin/glfsheal...Reading symbols > > > > from /usr/lib/debug/usr/sbin/glfsheal.debug...done. > > > > done. > > > > [New LWP 19430] > > > > [New LWP 19431] > > > > [New LWP 19434] > > > > [New LWP 19436] > > > > [New LWP 19433] > > > > [New LWP 19437] > > > > [New LWP 19432] > > > > [New LWP 19435] > > > > [Thread debugging using libthread_db enabled] > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'. > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 inode_unref (inode=0x7f7a1e27806c) at inode.c:499 > > > > 499 table = inode->table; > > > > (gdb) bt > > > > #0 inode_unref (inode=0x7f7a1e27806c) at inode.c:499 > > > > #1 0x00007f7a265e8a61 in fini (this=<optimized out>) at > > > > qemu-block.c:1092 > > > > #2 0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at > > > > xlator.c:463 > > > > #3 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at > > > > xlator.c:453 > > > > #4 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at > > > > xlator.c:453 > > > > #5 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at > > > > xlator.c:453 > > > > #6 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at > > > > xlator.c:453 > > > > #7 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at > > > > xlator.c:453 > > > > #8 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at > > > > xlator.c:453 > > > > #9 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at > > > > xlator.c:453 > > > > #10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at > > > > xlator.c:453 > > > > #11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>) > > > > at xlator.c:545 > > > > #12 0x00007f7a39a90b25 in glusterfs_graph_deactivate > > > > (graph=<optimized out>) at graph.c:340 > > > > #13 0x00007f7a38d50e3c in pub_glfs_fini > > > > (fs=fs@entry=0x7f7a3a6b6010) at glfs.c:1155 > > > > #14 0x00007f7a39f18ed4 in main (argc=<optimized out>, > > > > argv=<optimized out>) at glfs-heal.c:821 > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Alessandro > > > > > > > > > Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri > > > > > <pkarampu@xxxxxxxxxx> ha scritto: > > > > > > > > > > > > > > > > > > > > On 05/29/2015 02:37 PM, Alessandro De Salvo wrote: > > > > > > > > > > > Hi Pranith, > > > > > > many thanks for the help! > > > > > > The volume info of the problematic volume is the following: > > > > > > > > > > > > > > > > > > # gluster volume info adsnet-vm-01 > > > > > > > > > > > > Volume Name: adsnet-vm-01 > > > > > > Type: Replicate > > > > > > Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c > > > > > > Status: Started > > > > > > Number of Bricks: 1 x 2 = 2 > > > > > > Transport-type: tcp > > > > > > Bricks: > > > > > > Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data > > > > > > Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data > > > > > > Options Reconfigured: > > > > > > nfs.disable: true > > > > > > features.barrier: disable > > > > > > features.file-snapshot: on > > > > > > server.allow-insecure: on > > > > > Are you sure the attached log is correct? I do not see any > > > > > backtrace in the log file to indicate there is a crash :-(. > > > > > Could you do "grep -i crash /var/log/glusterfs/*" to see if > > > > > there is some other file with the crash. If that also fails, > > > > > will it be possible for you to provide the backtrace of the > > > > > core by opening it using gdb? > > > > > > > > > > Pranith > > > > > > > > > > > > > > > > > > The log is in attachment. > > > > > > I just wanted to add that the heal info command works fine > > > > > > on other volumes hosted by the same machines, so it’s just > > > > > > this volume which is causing problems. > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Alessandro > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar > > > > > > > Karampuri <pkarampu@xxxxxxxxxx> ha scritto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 05/29/2015 02:13 PM, Alessandro De Salvo wrote: > > > > > > > > > Hi, > > > > > > > > > I'm facing a strange issue with split brain reporting. > > > > > > > > > I have upgraded to 3.7.0, after stopping all gluster > > > > > > > > > processes as described in the twiki, on all servers > > > > > > > > > hosting the volumes. The upgrade and the restart was > > > > > > > > > fine, and the volumes are accessible. > > > > > > > > > However I had two files in split brain that I did not > > > > > > > > > heal before upgrading, so I tried a full heal with > > > > > > > > > 3.7.0. The heal was launched correctly, but when I now > > > > > > > > > perform an heal info there is no output, while the > > > > > > > > > heal statistics says there are actually 2 files in > > > > > > > > > split brain. In the logs I see something like this: > > > > > > > > > > > > > > > > > > glustershd.log: > > > > > > > > > [2015-05-29 08:28:43.008373] I > > > > > > > > > [afr-self-heal-entry.c:558:afr_selfheal_entry_do] > > > > > > > > > 0-adsnet-gluster-01-replicate-0: performing entry > > > > > > > > > selfheal on 7fd1262d-949b-402e-96c2-ae487c8d4e27 > > > > > > > > > [2015-05-29 08:28:43.012690] W > > > > > > > > > [client-rpc-fops.c:241:client3_3_mknod_cbk] > > > > > > > > > 0-adsnet-gluster-01-client-1: remote operation failed: > > > > > > > > > Invalid argument. Path: (null) > > > > > > > > Hey could you let us know "gluster volume info" output? > > > > > > > > Please let us know the backtrace printed > > > > > > > > by /var/log/glusterfs/glfsheal-<volname>.log as well. > > > > > > > Please attach /var/log/glusterfs/glfsheal-<volname>.log > > > > > > > file to this thread so that I can take a look. > > > > > > > > > > > > > > Pranith > > > > > > > > > > > > > > > > Pranith > > > > > > > > > > > > > > > > > > > > > > > > > > > So, it seems like the files to be healed are not > > > > > > > > > correctly identified, or at least their path is null. > > > > > > > > > Also, every time I issue a "gluster volume heal > > > > > > > > > <volname> info" a core dump is generated in the log > > > > > > > > > area. > > > > > > > > > All servers are using the latest CentOS 7. > > > > > > > > > Any idea why this might be happening and how to solve > > > > > > > > > it? > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Alessandro > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Gluster-users mailing list > > > > > > > > > Gluster-users@xxxxxxxxxxx > > > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users@xxxxxxxxxxx > > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users