Re: Lots of mount points failing with core dumps, help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 12, 2014 at 08:24:44AM +0800, Franco Broi wrote:
> 
> Did a major update yesterday to 3.5.2 on all servers and happy to report
> that it went smoothly and everything seems to be working well. I also
> did an update of ZOL to 0.6.3 running on linux-3.14.16 and the result of
> all the updates is a definite improvement in speed of ls for the fuse
> client, good enough for us to switch back from using gNFS for
> interactive applications.
> 
> All my mountpoints look good too. no more crashes.
> 
> Thanks for all the good work, hopefully you wont be hearing much from me
> for a while!

Thanks for the update, Franco!

Niels

> 
> Cheers,
> 
> 
> On Wed, 2014-08-06 at 08:54 +0800, Franco Broi wrote: 
> > I think all the mounts that have failed were mounted with 3.4.3 prior to
> > the update. Not sure why they continued to work for several days before
> > failing but remounting them with 3.5 appears to fix the problem. Running
> > fusermount -zu makes them eventually exit with a core dump.
> > 
> > So no more live updates!
> > 
> > Cheers,
> > 
> > 
> > On Tue, 2014-08-05 at 14:24 +0800, Franco Broi wrote: 
> > > On Mon, 2014-08-04 at 12:31 +0200, Niels de Vos wrote: 
> > > > On Mon, Aug 04, 2014 at 05:05:10PM +0800, Franco Broi wrote:
> > > > > 
> > > > > A bit more background to this.
> > > > > 
> > > > > I was running 3.4.3 on all the clients (120+ nodes) but I also have a
> > > > > 3.5 volume which I wanted to mount on the same nodes. The 3.4.3 client
> > > > > mounts of the 3.5 volume would sometimes hang on mount requiring a
> > > > > volume stop/start to clear. I raised this issue on this list but it was
> > > > > never resolved. I also tried to downgrade the 3.5 volume to 3.4 but that
> > > > > also didn't work.
> > > > > 
> > > > > I had a single client node running 3.5 and it was able to mount both
> > > > > volumes so I decided to update everything on the client side.
> > > > > 
> > > > > Middle of last week I did a glusterfs update from 3.4.3 to 3.5.1 and
> > > > > everything appeared to be ok. The existing 3.4.3 mounts continued to
> > > > > work and I was able to mount the 3.5 volume without any of the hanging
> > > > > problems I was seeing before. Great, I thought.
> > > > > 
> > > > > Today mount points started to fail, both for the 3.4 volume with the 3.4
> > > > > client and for the 3.5 volume with the 3.5 client.
> > > > > 
> > > > > I've been remounting the filesystems as they break but it's a pretty
> > > > > unstable environment.
> > > > > 
> > > > > BTW, is there some way to get gluster to write its core files somewhere
> > > > > other than the root filesystem? If I could do that I might at least get
> > > > > a complete core dump to run gdb on.
> > > > 
> > > > You can set a sysctl with a path, for example:
> > > > 
> > > >     # mkdir /var/cores
> > > >     # mount /dev/local_vg/cores /var/cores
> > > >     # sysctl -w kernel.core_pattern=/var/cores/core
> > > 
> > > Thanks for that.
> > > 
> > > > 
> > > > I am not sure if the "mismatching layouts" can cause a segmentation 
> > > > fault. In any case, it would be good to get the extended attributes for 
> > > > the directories in question. The xattrs contain the hash-range (layout) 
> > > > on where the files should get located.
> > > > 
> > > > For all bricks (replace the "..." with the path for the brick):
> > > > 
> > > >    # getfattr -m. -ehex -d .../promax_data/115_endurance/31fasttrackstk
> > > > 
> > > > Please also include a "gluster volume info $VOLUME".
> > > 
> > > Please see attached.
> > > 
> > > 
> > > > 
> > > > You should also file a bug for this, core dumping should definitely not 
> > > > happen.
> > > > 
> > > > Thanks,
> > > > Niels
> > > > 
> > > > 
> > > > 
> > > > >
> > > > > Cheers,
> > > > > 
> > > > > On Mon, 2014-08-04 at 12:53 +0530, Pranith Kumar Karampuri wrote: 
> > > > > > CC dht folks
> > > > > > 
> > > > > > Pranith
> > > > > > On 08/04/2014 11:52 AM, Franco Broi wrote:
> > > > > > > I've had a sudden spate of mount points failing with Transport endpoint
> > > > > > > not connected and core dumps. The dumps are so large and my root
> > > > > > > partitions so small that I haven't managed to get a decent traceback.
> > > > > > >
> > > > > > > BFD: Warning: //core.2351 is truncated: expected core file size >=
> > > > > > > 165773312, found: 154107904.
> > > > > > > [New Thread 2351]
> > > > > > > [New Thread 2355]
> > > > > > > [New Thread 2359]
> > > > > > > [New Thread 2356]
> > > > > > > [New Thread 2354]
> > > > > > > [New Thread 2360]
> > > > > > > [New Thread 2352]
> > > > > > > Cannot access memory at address 0x1700000006
> > > > > > > (gdb) where
> > > > > > > #0  glusterfs_signals_setup (ctx=0x8b17c0) at glusterfsd.c:1715
> > > > > > > Cannot access memory at address 0x7fffaa46b2e0
> > > > > > >
> > > > > > >
> > > > > > > Log file is full of messages like this:
> > > > > > >
> > > > > > > [2014-08-04 06:10:11.160482] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > > > [2014-08-04 06:10:11.160495] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > > > [2014-08-04 06:10:11.160502] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > > > [2014-08-04 06:10:11.160514] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > > > [2014-08-04 06:10:11.160522] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > > > [2014-08-04 06:10:11.160622] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > > > [2014-08-04 06:10:11.160634] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > > >
> > > > > > >
> > > > > > > I'm running 3.5.1 on the client side and 3.4.3 on the server.
> > > > > > >
> > > > > > > Any quick help much appreciated.
> > > > > > >
> > > > > > > Cheersm
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Gluster-users mailing list
> > > > > > > Gluster-users@xxxxxxxxxxx
> > > > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > > > > > 
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users@xxxxxxxxxxx
> > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > > 
> > 
> 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux