Re: Lots of mount points failing with core dumps, help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 04, 2014 at 05:05:10PM +0800, Franco Broi wrote:
> 
> A bit more background to this.
> 
> I was running 3.4.3 on all the clients (120+ nodes) but I also have a
> 3.5 volume which I wanted to mount on the same nodes. The 3.4.3 client
> mounts of the 3.5 volume would sometimes hang on mount requiring a
> volume stop/start to clear. I raised this issue on this list but it was
> never resolved. I also tried to downgrade the 3.5 volume to 3.4 but that
> also didn't work.
> 
> I had a single client node running 3.5 and it was able to mount both
> volumes so I decided to update everything on the client side.
> 
> Middle of last week I did a glusterfs update from 3.4.3 to 3.5.1 and
> everything appeared to be ok. The existing 3.4.3 mounts continued to
> work and I was able to mount the 3.5 volume without any of the hanging
> problems I was seeing before. Great, I thought.
> 
> Today mount points started to fail, both for the 3.4 volume with the 3.4
> client and for the 3.5 volume with the 3.5 client.
> 
> I've been remounting the filesystems as they break but it's a pretty
> unstable environment.
> 
> BTW, is there some way to get gluster to write its core files somewhere
> other than the root filesystem? If I could do that I might at least get
> a complete core dump to run gdb on.

You can set a sysctl with a path, for example:

    # mkdir /var/cores
    # mount /dev/local_vg/cores /var/cores
    # sysctl -w kernel.core_pattern=/var/cores/core

I am not sure if the "mismatching layouts" can cause a segmentation 
fault. In any case, it would be good to get the extended attributes for 
the directories in question. The xattrs contain the hash-range (layout) 
on where the files should get located.

For all bricks (replace the "..." with the path for the brick):

   # getfattr -m. -ehex -d .../promax_data/115_endurance/31fasttrackstk

Please also include a "gluster volume info $VOLUME".

You should also file a bug for this, core dumping should definitely not 
happen.

Thanks,
Niels



>
> Cheers,
> 
> On Mon, 2014-08-04 at 12:53 +0530, Pranith Kumar Karampuri wrote: 
> > CC dht folks
> > 
> > Pranith
> > On 08/04/2014 11:52 AM, Franco Broi wrote:
> > > I've had a sudden spate of mount points failing with Transport endpoint
> > > not connected and core dumps. The dumps are so large and my root
> > > partitions so small that I haven't managed to get a decent traceback.
> > >
> > > BFD: Warning: //core.2351 is truncated: expected core file size >=
> > > 165773312, found: 154107904.
> > > [New Thread 2351]
> > > [New Thread 2355]
> > > [New Thread 2359]
> > > [New Thread 2356]
> > > [New Thread 2354]
> > > [New Thread 2360]
> > > [New Thread 2352]
> > > Cannot access memory at address 0x1700000006
> > > (gdb) where
> > > #0  glusterfs_signals_setup (ctx=0x8b17c0) at glusterfsd.c:1715
> > > Cannot access memory at address 0x7fffaa46b2e0
> > >
> > >
> > > Log file is full of messages like this:
> > >
> > > [2014-08-04 06:10:11.160482] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > [2014-08-04 06:10:11.160495] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > [2014-08-04 06:10:11.160502] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > [2014-08-04 06:10:11.160514] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > [2014-08-04 06:10:11.160522] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > [2014-08-04 06:10:11.160622] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > [2014-08-04 06:10:11.160634] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > >
> > >
> > > I'm running 3.5.1 on the client side and 3.4.3 on the server.
> > >
> > > Any quick help much appreciated.
> > >
> > > Cheersm
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users@xxxxxxxxxxx
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux