Re: GlusterFS 3.7.11 crash issue

Anoop C S <anoopcs@xxxxxxxxxx> · Wed, 29 Jun 2016 12:09:28 +0530

On Tue, 2016-06-28 at 10:49 +0200, Yann LEMARIE wrote:
> Hi,
> 
> I found the coredump file, but it's a 15Mo file (zipped), I can't
> post it on this mailling list.
> 

Great. In order to exactly pin point the crash location, can you please
attach gdb to extracted coredump file and share us the complete back
trace by executing `bt` command in gdb shell? Apart from gdb you may be
instructed to install some debug-info packages for extracting a useful
back trace while attaching gdb as follows:

# gdb /usr/sbin/glusterfsd <path-to-coredump-file>

If prompted install required packages and reattach the coredump file.
When you are inside (gdb) prompt type 'bt' and paste the back trace.

> Here is some parts of the repport :
> 
> > ProblemType: Crash
> > Architecture: amd64
> > Date: Sun Jun 26 11:27:44 2016
> > DistroRelease: Ubuntu 14.04
> > ExecutablePath: /usr/sbin/glusterfsd
> > ExecutableTimestamp: 1460982898
> > ProcCmdline: /usr/sbin/glusterfsd -s nfs05 --volfile-id
> > cdn.nfs05.srv-cdn -p /var/lib/glusterd/vols/cdn/run/nfs05-srv-
> > cdn.pid -S /var/run/gluster/d52ac3e6c0a3fa316a9e8360976f3af5.socket
> > --brick-name /srv/cdn -l /var/log/glusterfs/bricks/srv-cdn.log --
> > xlator-option *-posix.glusterd-uuid=6af63b78-a3da-459d-a909-
> > c010e6c9072c --brick-port 49155 --xlator-option cdn-server.listen-
> > port=49155
> > ProcCwd: /
> > ProcEnviron:
> >  PATH=(custom, no user)
> >  TERM=linux
> > ProcMaps:
> >  7f25f18d9000-7f25f18da000 ---p 00000000 00:00 0
> >  7f25f18da000-7f25f19da000 rw-p 00000000 00:00
> > 0                          [stack:849]
> >  7f25f19da000-7f25f19db000 ---p 00000000 00:00 0
>  ...
> > ProcStatus:
> >  Name:  glusterfsd
> >  State: D (disk sleep)
> >  Tgid:  7879
> >  Ngid:  0
> >  Pid:   7879
> >  PPid:  1
> >  TracerPid:     0
> >  Uid:   0       0       0       0
> >  Gid:   0       0       0       0
> >  FDSize:        64
> >  Groups:        0
> >  VmPeak:          878404 kB
> >  VmSize:          878404 kB
> >  VmLck:        0 kB
> >  VmPin:        0 kB
> >  VmHWM:    96104 kB
> >  VmRSS:    90652 kB
> >  VmData:          792012 kB
> >  VmStk:      276 kB
> >  VmExe:       84 kB
> >  VmLib:     7716 kB
> >  VmPTE:      700 kB
> >  VmSwap:           20688 kB
> >  Threads:       22
> >  SigQ:  0/30034
> >  SigPnd:        0000000000000000
> >  ShdPnd:        0000000000000000
> >  SigBlk:        0000000000004a01
> >  SigIgn:        0000000000001000
> >  SigCgt:        00000001800000fa
> >  CapInh:        0000000000000000
> >  CapPrm:        0000001fffffffff
> >  CapEff:        0000001fffffffff
> >  CapBnd:        0000001fffffffff
> >  Seccomp:       0
> >  Cpus_allowed:  7fff
> >  Cpus_allowed_list:     0-14
> >  Mems_allowed:  00000000,00000001
> >  Mems_allowed_list:     0
> >  voluntary_ctxt_switches:       3
> >  nonvoluntary_ctxt_switches:    1
> > Signal: 11
> > Uname: Linux 3.13.0-44-generic x86_64
> > UserGroups:
> > CoreDump: base64
>  ...
> 
> Yann
> 
> Le 28/06/2016 09:31, Anoop C S a écrit :
> > On Mon, 2016-06-27 at 15:05 +0200, Yann LEMARIE wrote:
> > >  @Anoop,
> > > 
> > > Where can I find the coredump file ?
> > > 
> > You will get hints about the crash from entries inside
> > /var/log/messages(for example pid of the process, location of
> > coredump
> > etc). 
> > 
> > > The crash occurs 2 times last 7 days, each time a sunday morning
> > > with
> > > no reason, no increase of traffic or something like this, the
> > > volume
> > > was mounted since 15 days.
> > > 
> > > The bricks are used as a CDN like, distributting small images and
> > > css
> > > files with a nginx https service (with a load balancer and 2
> > > EC2), on
> > > a sunday morning there is not a lot of activity ...
> > > 
> > From the very minimal back trace that we have from brick logs I
> > would
> > assume that a truncate operation was being handled by trash
> > translator
> > and it crashed.
> > 
> > > Volume infos: 
> > > > root@nfs05 /var/log/glusterfs # gluster volume info cdn
> > > >  
> > > > Volume Name: cdn
> > > > Type: Replicate
> > > > Volume ID: c53b9bae-5e12-4f13-8217-53d8c96c302c
> > > > Status: Started
> > > > Number of Bricks: 1 x 2 = 2
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: nfs05:/srv/cdn
> > > > Brick2: nfs06:/srv/cdn
> > > > Options Reconfigured:
> > > > performance.readdir-ahead: on
> > > > features.trash: on
> > > > features.trash-max-filesize: 20MB
> > >  
> > > I don't know if there is a link with this crash problem, but I
> > > have
> > > another problem with my 2 servers that make GluserFS's clients
> > > disconnected (from another volume) :
> > > > Jun 24 02:28:04 nfs05 kernel: [2039468.818617] xen_netfront:
> > > > xennet: skb rides the rocket: 19 slots
> > > > Jun 24 02:28:11 nfs05 kernel: [2039475.744086] net_ratelimit:
> > > > 66
> > > > callbacks suppressed
> > >  It seem to be a network interface problem :
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811
> > > 
> > > Yann
> > > 
> > > Le 27/06/2016 12:59, Anoop C S a écrit :
> > > > On Mon, 2016-06-27 at 09:47 +0200, Yann LEMARIE wrote:
> > > > > Hi,
> > > > > 
> > > > > I'm using GlusterFS since many years and never see this
> > > > > problem,
> > > > > but
> > > > > this is the second time in one week ...
> > > > > 
> > > > > I have 3 volumes with 2 bricks and 1 volume crash with no
> > > > > reason,
> > > > Did you observe the crash while mounting the volume? Or can you
> > > > be
> > > > more
> > > > specific on what were you doing just before you saw the crash?
> > > > Can
> > > > you
> > > > please share the output of `gluster volume info <VOLNAME>`?
> > > > 
> > > > >  I just have to stop/start the volume to make it up again.
> > > > > The only logs I can find are in syslog :
> > > > > 
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: pending frames:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: frame : type(0) op(10)
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: patchset:
> > > > > > git://git.gluster.com/glusterfs.git
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: signal received: 11
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: time of crash:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: 2016-06-26 09:27:44
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: configuration details:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: argp 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: backtrace 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: dlfcn 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: libpthread 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: llistxattr 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: setfsid 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: spinlock 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: epoll.h 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: xattr.h 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: st_atim.tv_nsec 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: package-string:
> > > > > > glusterfs
> > > > > > 3.7.11
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: ---------
> > > > > > 
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: pending frames:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: frame : type(0) op(10)
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: patchset:
> > > > > > git://git.gluster.com/glusterfs.git
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: signal received: 11
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: time of crash:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: 2016-06-26 09:27:44
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: configuration details:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: argp 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: backtrace 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: dlfcn 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: libpthread 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: llistxattr 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: setfsid 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: spinlock 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: epoll.h 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: xattr.h 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: st_atim.tv_nsec 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: package-string:
> > > > > > glusterfs
> > > > > > 3.7.11
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: ---------
> > > > > > 
> > > > >  
> > > > > Thanks for your help
> > > > > 
> > > > > 
> > > > > Regards
> > > > > -- 
> > > > > Yann Lemarié
> > > > > iRaiser - Support Technique
> > > > >  
> > > > > ylemarie@xxxxxxxxxx
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users@xxxxxxxxxxx
> > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >  
> > > -- 
> > > Yann Lemarié
> > > iRaiser - Support Technique
> > >  
> > > ylemarie@xxxxxxxxxx
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-users
>  
> -- 
> Yann Lemarié
> iRaiser - Support Technique
>  
> ylemarie@xxxxxxxxxx
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users