Re: Volume heal info not reporting files in split brain and core dumping, after upgrading to 3.7.0

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Sat, 30 May 2015 06:22:24 +0530

On 05/29/2015 05:28 PM, Alessandro De Salvo wrote:
Hi Pranith,
many thanks. Indeed, I have recompiled the glfsheal executable using the
changes in your patch set, but without commenting the first entry of
glfs_fini (the one marked as "Don't we need to comment this too?")
indeed it still segfaults. After commenting out that one too it seems to
run fine.
For the moment I can use this patched executable, until you fix it in a
release.
Yes you are correct, Ravi in the review also pointed this out. Latest 
version of patch works correctly.

http://review.gluster.org/11002 is the fix for the actual crash which 
will be available in the next release.

Pranith
Many thanks!

	Alessandro

On Fri, 2015-05-29 at 16:03 +0530, Pranith Kumar Karampuri wrote:

On 05/29/2015 03:36 PM, Alessandro De Salvo wrote:

Hi Pranith,
thanks to you! 2-3 days are fine, don’t worry. However, if you can
give me the details of the compilation of glsheal you are
mentioning, we could have a quick check if everything’s fine with
the fix, before you release. So just let me know what you prefer.
For me waiting 2-3 days is not a problem though, as it is not a
critical server and I could even recreate the volumes.
We recently introduced code path which frees up memory in long
standing processes. Seems like this is not tested when file-snapshots
feature is on. If that option is disabled the crash won't happen.
"gluster volume heal <volname> info" Uses the same api. But
fortunately this "glfsheal" process will die as soon as heal info
output is gathered. So no need to call this freeing of memory just
before dying. For now we enabled this code path (patch:
http://review.gluster.org/11001) only for internal builds but not in
released versions while we stabilize that part of the code. You can
take this patch for patching glfsheal.

Pranith
Thanks again,

Alessandro

Il giorno 29/mag/2015, alle ore 11:54, Pranith Kumar Karampuri
<pkarampu@xxxxxxxxxx> ha scritto:

On 05/29/2015 03:16 PM, Alessandro De Salvo wrote:

Hi Pranith,
I’m definitely sure the log is correct, but you are also correct
when you say there is no sign of crash (even checking with grep!
).
However I see core dumps (e.g. core.19430) in /var/log/gluster)
created every time I issue the heal info command.
 From gdb I see this:
Thanks for providing the information Alessandro. We will fix this
issue. I am wondering how we can unblock you in the interim. There
is a plan to release 3.7.1 in 2-3 days I think. I can try to make
this fix for that release. Let me know if you can wait that long?
Another possibility is to compile just glfsheal binary with the
fix which "gluster volume heal <volname> info" internally. Let me
know.

Pranith.

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute
it.
There is NO WARRANTY, to the extent permitted by law.  Type
"show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glfsheal...Reading symbols
from /usr/lib/debug/usr/sbin/glfsheal.debug...done.
done.
[New LWP 19430]
[New LWP 19431]
[New LWP 19434]
[New LWP 19436]
[New LWP 19433]
[New LWP 19437]
[New LWP 19432]
[New LWP 19435]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'.
Program terminated with signal 11, Segmentation fault.
#0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
499             table = inode->table;
(gdb) bt
#0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
#1  0x00007f7a265e8a61 in fini (this=<optimized out>) at
qemu-block.c:1092
#2  0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at
xlator.c:463
#3  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at
xlator.c:453
#4  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at
xlator.c:453
#5  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at
xlator.c:453
#6  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at
xlator.c:453
#7  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at
xlator.c:453
#8  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at
xlator.c:453
#9  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at
xlator.c:453
#10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at
xlator.c:453
#11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>)
at xlator.c:545
#12 0x00007f7a39a90b25 in glusterfs_graph_deactivate
(graph=<optimized out>) at graph.c:340
#13 0x00007f7a38d50e3c in pub_glfs_fini
(fs=fs@entry=0x7f7a3a6b6010) at glfs.c:1155
#14 0x00007f7a39f18ed4 in main (argc=<optimized out>,
argv=<optimized out>) at glfs-heal.c:821

Thanks,

Alessandro

Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri
<pkarampu@xxxxxxxxxx> ha scritto:

On 05/29/2015 02:37 PM, Alessandro De Salvo wrote:

Hi Pranith,
many thanks for the help!
The volume info of the problematic volume is the following:

# gluster volume info adsnet-vm-01

Volume Name: adsnet-vm-01
Type: Replicate
Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
Options Reconfigured:
nfs.disable: true
features.barrier: disable
features.file-snapshot: on
server.allow-insecure: on
Are you sure the attached log is correct? I do not see any
backtrace in the log file to indicate there is a crash :-(.
Could you do "grep -i crash /var/log/glusterfs/*" to see if
there is some other file with the crash. If that also fails,
will it be possible for you to provide the backtrace of the
core by opening it using gdb?

Pranith

The log is in attachment.
I just wanted to add that the heal info command works fine
on other volumes hosted by the same machines, so it’s just
this volume which is causing problems.
Thanks,

Alessandro

Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar
Karampuri <pkarampu@xxxxxxxxxx> ha scritto:

On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote:

On 05/29/2015 02:13 PM, Alessandro De Salvo wrote:
Hi,
I'm facing a strange issue with split brain reporting.
I have upgraded to 3.7.0, after stopping all gluster
processes as described in the twiki, on all servers
hosting the volumes. The upgrade and the restart was
fine, and the volumes are accessible.
However I had two files in split brain that I did not
heal before upgrading, so I tried a full heal with
3.7.0. The heal was launched correctly, but when I now
perform an heal info there is no output, while the
heal statistics says there are actually 2 files in
split brain. In the logs I see something like this:

glustershd.log:
[2015-05-29 08:28:43.008373] I
[afr-self-heal-entry.c:558:afr_selfheal_entry_do]
0-adsnet-gluster-01-replicate-0: performing entry
selfheal on 7fd1262d-949b-402e-96c2-ae487c8d4e27
[2015-05-29 08:28:43.012690] W
[client-rpc-fops.c:241:client3_3_mknod_cbk]
0-adsnet-gluster-01-client-1: remote operation failed:
Invalid argument. Path: (null)
Hey could you let us know "gluster volume info" output?
Please let us know the backtrace printed
by /var/log/glusterfs/glfsheal-<volname>.log as well.
Please attach /var/log/glusterfs/glfsheal-<volname>.log
file to this thread so that I can take a look.

Pranith
Pranith

So, it seems like the files to be healed are not
correctly identified, or at least their path is null.
Also, every time I issue a "gluster volume heal
<volname> info" a core dump is generated in the log
area.
All servers are using the latest CentOS 7.
Any idea why this might be happening and how to solve
it?
Thanks,

    Alessandro

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users