Re: Volume heal info not reporting files in split brain and core dumping, after upgrading to 3.7.0

Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> · Fri, 29 May 2015 13:58:00 +0200

Hi Pranith,
many thanks. Indeed, I have recompiled the glfsheal executable using the
changes in your patch set, but without commenting the first entry of
glfs_fini (the one marked as "Don't we need to comment this too?")
indeed it still segfaults. After commenting out that one too it seems to
run fine.
For the moment I can use this patched executable, until you fix it in a
release.
Many thanks!

	Alessandro

On Fri, 2015-05-29 at 16:03 +0530, Pranith Kumar Karampuri wrote:
> 
> 
> On 05/29/2015 03:36 PM, Alessandro De Salvo wrote:
> 
> > Hi Pranith,
> > thanks to you! 2-3 days are fine, don’t worry. However, if you can
> > give me the details of the compilation of glsheal you are
> > mentioning, we could have a quick check if everything’s fine with
> > the fix, before you release. So just let me know what you prefer.
> > For me waiting 2-3 days is not a problem though, as it is not a
> > critical server and I could even recreate the volumes.
> 
> We recently introduced code path which frees up memory in long
> standing processes. Seems like this is not tested when file-snapshots
> feature is on. If that option is disabled the crash won't happen.
> "gluster volume heal <volname> info" Uses the same api. But
> fortunately this "glfsheal" process will die as soon as heal info
> output is gathered. So no need to call this freeing of memory just
> before dying. For now we enabled this code path (patch:
> http://review.gluster.org/11001) only for internal builds but not in
> released versions while we stabilize that part of the code. You can
> take this patch for patching glfsheal.
> 
> Pranith
> > Thanks again,
> > 
> > 
> > Alessandro
> > 
> > > Il giorno 29/mag/2015, alle ore 11:54, Pranith Kumar Karampuri
> > > <pkarampu@xxxxxxxxxx> ha scritto:
> > > 
> > > 
> > > 
> > > On 05/29/2015 03:16 PM, Alessandro De Salvo wrote:
> > > 
> > > > Hi Pranith,
> > > > I’m definitely sure the log is correct, but you are also correct
> > > > when you say there is no sign of crash (even checking with grep!
> > > > ).
> > > > However I see core dumps (e.g. core.19430) in /var/log/gluster)
> > > > created every time I issue the heal info command.
> > > > From gdb I see this:
> > > Thanks for providing the information Alessandro. We will fix this
> > > issue. I am wondering how we can unblock you in the interim. There
> > > is a plan to release 3.7.1 in 2-3 days I think. I can try to make
> > > this fix for that release. Let me know if you can wait that long?
> > > Another possibility is to compile just glfsheal binary with the
> > > fix which "gluster volume heal <volname> info" internally. Let me
> > > know.
> > > 
> > > Pranith.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
> > > > Copyright (C) 2013 Free Software Foundation, Inc.
> > > > License GPLv3+: GNU GPL version 3 or later
> > > > <http://gnu.org/licenses/gpl.html>
> > > > This is free software: you are free to change and redistribute
> > > > it.
> > > > There is NO WARRANTY, to the extent permitted by law.  Type
> > > > "show copying"
> > > > and "show warranty" for details.
> > > > This GDB was configured as "x86_64-redhat-linux-gnu".
> > > > For bug reporting instructions, please see:
> > > > <http://www.gnu.org/software/gdb/bugs/>...
> > > > Reading symbols from /usr/sbin/glfsheal...Reading symbols
> > > > from /usr/lib/debug/usr/sbin/glfsheal.debug...done.
> > > > done.
> > > > [New LWP 19430]
> > > > [New LWP 19431]
> > > > [New LWP 19434]
> > > > [New LWP 19436]
> > > > [New LWP 19433]
> > > > [New LWP 19437]
> > > > [New LWP 19432]
> > > > [New LWP 19435]
> > > > [Thread debugging using libthread_db enabled]
> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > > Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'.
> > > > Program terminated with signal 11, Segmentation fault.
> > > > #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> > > > 499             table = inode->table;
> > > > (gdb) bt
> > > > #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> > > > #1  0x00007f7a265e8a61 in fini (this=<optimized out>) at
> > > > qemu-block.c:1092
> > > > #2  0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at
> > > > xlator.c:463
> > > > #3  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at
> > > > xlator.c:453
> > > > #4  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at
> > > > xlator.c:453
> > > > #5  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at
> > > > xlator.c:453
> > > > #6  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at
> > > > xlator.c:453
> > > > #7  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at
> > > > xlator.c:453
> > > > #8  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at
> > > > xlator.c:453
> > > > #9  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at
> > > > xlator.c:453
> > > > #10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at
> > > > xlator.c:453
> > > > #11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>)
> > > > at xlator.c:545
> > > > #12 0x00007f7a39a90b25 in glusterfs_graph_deactivate
> > > > (graph=<optimized out>) at graph.c:340
> > > > #13 0x00007f7a38d50e3c in pub_glfs_fini
> > > > (fs=fs@entry=0x7f7a3a6b6010) at glfs.c:1155
> > > > #14 0x00007f7a39f18ed4 in main (argc=<optimized out>,
> > > > argv=<optimized out>) at glfs-heal.c:821
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > 
> > > > Alessandro
> > > > 
> > > > > Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri
> > > > > <pkarampu@xxxxxxxxxx> ha scritto:
> > > > > 
> > > > > 
> > > > > 
> > > > > On 05/29/2015 02:37 PM, Alessandro De Salvo wrote:
> > > > > 
> > > > > > Hi Pranith,
> > > > > > many thanks for the help!
> > > > > > The volume info of the problematic volume is the following:
> > > > > > 
> > > > > > 
> > > > > > # gluster volume info adsnet-vm-01
> > > > > >  
> > > > > > Volume Name: adsnet-vm-01
> > > > > > Type: Replicate
> > > > > > Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
> > > > > > Status: Started
> > > > > > Number of Bricks: 1 x 2 = 2
> > > > > > Transport-type: tcp
> > > > > > Bricks:
> > > > > > Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
> > > > > > Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
> > > > > > Options Reconfigured:
> > > > > > nfs.disable: true
> > > > > > features.barrier: disable
> > > > > > features.file-snapshot: on
> > > > > > server.allow-insecure: on
> > > > > Are you sure the attached log is correct? I do not see any
> > > > > backtrace in the log file to indicate there is a crash :-(.
> > > > > Could you do "grep -i crash /var/log/glusterfs/*" to see if
> > > > > there is some other file with the crash. If that also fails,
> > > > > will it be possible for you to provide the backtrace of the
> > > > > core by opening it using gdb?
> > > > > 
> > > > > Pranith
> > > > > > 
> > > > > > 
> > > > > > The log is in attachment.
> > > > > > I just wanted to add that the heal info command works fine
> > > > > > on other volumes hosted by the same machines, so it’s just
> > > > > > this volume which is causing problems.
> > > > > > Thanks,
> > > > > > 
> > > > > > 
> > > > > > Alessandro
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar
> > > > > > > Karampuri <pkarampu@xxxxxxxxxx> ha scritto:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 05/29/2015 02:13 PM, Alessandro De Salvo wrote:
> > > > > > > > > Hi,
> > > > > > > > > I'm facing a strange issue with split brain reporting.
> > > > > > > > > I have upgraded to 3.7.0, after stopping all gluster
> > > > > > > > > processes as described in the twiki, on all servers
> > > > > > > > > hosting the volumes. The upgrade and the restart was
> > > > > > > > > fine, and the volumes are accessible.
> > > > > > > > > However I had two files in split brain that I did not
> > > > > > > > > heal before upgrading, so I tried a full heal with
> > > > > > > > > 3.7.0. The heal was launched correctly, but when I now
> > > > > > > > > perform an heal info there is no output, while the
> > > > > > > > > heal statistics says there are actually 2 files in
> > > > > > > > > split brain. In the logs I see something like this:
> > > > > > > > > 
> > > > > > > > > glustershd.log:
> > > > > > > > > [2015-05-29 08:28:43.008373] I
> > > > > > > > > [afr-self-heal-entry.c:558:afr_selfheal_entry_do]
> > > > > > > > > 0-adsnet-gluster-01-replicate-0: performing entry
> > > > > > > > > selfheal on 7fd1262d-949b-402e-96c2-ae487c8d4e27
> > > > > > > > > [2015-05-29 08:28:43.012690] W
> > > > > > > > > [client-rpc-fops.c:241:client3_3_mknod_cbk]
> > > > > > > > > 0-adsnet-gluster-01-client-1: remote operation failed:
> > > > > > > > > Invalid argument. Path: (null)
> > > > > > > > Hey could you let us know "gluster volume info" output?
> > > > > > > > Please let us know the backtrace printed
> > > > > > > > by /var/log/glusterfs/glfsheal-<volname>.log as well.
> > > > > > > Please attach /var/log/glusterfs/glfsheal-<volname>.log
> > > > > > > file to this thread so that I can take a look.
> > > > > > > 
> > > > > > > Pranith
> > > > > > > > 
> > > > > > > > Pranith
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > So, it seems like the files to be healed are not
> > > > > > > > > correctly identified, or at least their path is null.
> > > > > > > > > Also, every time I issue a "gluster volume heal
> > > > > > > > > <volname> info" a core dump is generated in the log
> > > > > > > > > area.
> > > > > > > > > All servers are using the latest CentOS 7.
> > > > > > > > > Any idea why this might be happening and how to solve
> > > > > > > > > it?
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > >    Alessandro
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > Gluster-users mailing list
> > > > > > > > > Gluster-users@xxxxxxxxxxx
> > > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > Gluster-users mailing list
> > > > > > > > Gluster-users@xxxxxxxxxxx
> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > 
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users