Re: GlusterFS mount crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for necrobumping this, but this morning I've suffered this on my Proxmox  + GlusterFS cluster. In the log I can see this

[2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017] [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on fbc063cb-874e-475d-b585-f89
f7518acdd. [Operation not supported]
pending frames:
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
...
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
frame : type(1) op(FSYNC)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:  
2022-11-21 07:38:00 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 10.3
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0]
/lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8]
/usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef]
---------
The mount point wasn't accessible with the "Transport endpoint is not connected" message and it was shown like this.
d?????????   ? ?    ?            ?            ? vmdata

I had to stop all the VMs on that proxmox node, then stop the gluster daemon to ummount de directory, and after starting the daemon and re-mounting, all was working again.

My gluster volume info returns this
 
Volume Name: vmdata
Type: Distributed-Disperse
Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: g01:/data/brick1/brick
Brick2: g02:/data/brick2/brick
Brick3: g03:/data/brick1/brick
Brick4: g01:/data/brick2/brick
Brick5: g02:/data/brick1/brick
Brick6: g03:/data/brick2/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.shard: enable
features.shard-block-size: 256MB
performance.read-ahead: off
performance.quick-read: off
performance.io-cache: off
server.event-threads: 2
client.event-threads: 3
performance.client-io-threads: on
performance.stat-prefetch: off
dht.force-readdirp: off
performance.force-readdirp: off
network.remote-dio: on
features.cache-invalidation: on
performance.parallel-readdir: on
performance.readdir-ahead: on

Xavi, do you think the open-behind off setting can help somehow? I did try to understand what it does (with no luck), and if it could impact the performance of my VMs (I've the setup you know so well ;))
I would like to avoid more crashings like this, version 10.3 of gluster was working since two weeks ago, quite well until this morning.

Angel Docampo
  


El vie, 19 mar 2021 a las 2:10, David Cunningham (<dcunningham@xxxxxxxxxxxxx>) escribió:
Hi Xavi,

Thank you for that information. We'll look at upgrading it.


On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jahernan@xxxxxxxxxx> wrote:
Hi David,

with so little information it's hard to tell, but given that there are several OPEN and UNLINK operations, it could be related to an already fixed bug (in recent versions) in open-behind.

You can try disabling open-behind with this command:

    # gluster volume set <volname> open-behind off

But given the version you are using is very old and unmaintained, I would recommend you to upgrade to 8.x at least.

Regards,

Xavi


On Wed, Mar 10, 2021 at 5:10 AM David Cunningham <dcunningham@xxxxxxxxxxxxx> wrote:
Hello,

We have a GlusterFS 5.13 server which also mounts itself with the native FUSE client. Recently the FUSE mount crashed and we found the following in the syslog. There isn't anything logged in mnt-glusterfs.log for that time. After killing all processes with a file handle open on the filesystem we were able to unmount and then remount the filesystem successfully.

Would anyone have advice on how to debug this crash? Thank you in advance!

Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames:
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 times: [ frame : type(1) op(OPEN)]
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 times: [ frame : type(1) op(OPEN)]
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 times: [ frame : type(1) op(OPEN)]
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0)
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git://git.gluster.org/glusterfs.git
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash:
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details:
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs 5.13
Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: ---------
...
Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Main process exited, code=killed, status=11/SEGV
Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: Failed with result 'signal'.
...
Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: Service hold-off time over, scheduling restart.
Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: Scheduled restart job, restart counter is at 2.
Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs sharedstorage.
Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs sharedstorage...
Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount point does not exist
Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify a mount point
Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage:
Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8 /sbin/mount.glusterfs

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux