Re: KVM lockups on Gluster 4.1.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think I have seen this also on our CentOS 7.5 systems using GlusterFS 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now.

Thanx,

Claus.

(*)  libvirt/quemu log:
[2018-08-19 16:45:54.275830] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument]
[2018-08-19 16:45:54.276156] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument]
[2018-08-19 16:45:54.276159] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol
01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument]
[2018-08-19 16:45:54.276183] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol
01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument]
[2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout = 1800 for
192.168.13.131:49152
[2018-08-19 17:16:03.691113] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected]
[2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout = 1800 for
192.168.13.132:49152
[2018-08-19 17:46:03.856170] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected]
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
... many repeats ... 
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
[2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout = 1800 for
192.168.13.131:49152
[2018-08-19 18:16:04.022788] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected]
[2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout = 1800 for
192.168.13.132:49152
[2018-08-19 18:46:04.195881] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected]
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted (1)
qemu: terminating on signal 15 from pid 507
2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed
2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 (qemu-kvm-1.
5.3-156.el7_5.3)

At 19:37 the VM was restarted.



On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan@xxxxxxxxx> wrote:
I am using gluster to host KVM/QEMU images. I am seeing an intermittent issue where access to an image will hang. I have to do a lazy dismount of the gluster volume in order to break the lock and then reset the impacted virtual machine.

It happened again today and I caught the events below in the client side logs. Any thoughts on what might cause this? It seemed to begin after I upgraded from 3.12.10 to 4.1.1 a few weeks ago.

[2018-08-14 14:22:15.549501] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Invalid argument]
[2018-08-14 14:22:15.549576] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Invalid argument]
[2018-08-14 14:22:15.549583] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume gv1-client-4 with lock owner d89caca92b7f0000 [Invalid argument]
[2018-08-14 14:22:15.549615] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume gv1-client-5 with lock owner d89caca92b7f0000 [Invalid argument]
[2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159
[2018-08-14 14:52:18.726254] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Transport endpoint is not connected]
[2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164
[2018-08-14 15:22:25.962587] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Transport endpoint is not connected]
[2018-08-14 15:22:25.962618] W [MSGID: 108019] [afr-lk-common.c:601:is_blocking_locks_count_sufficient] 2-gv1-replicate-2: Unable to obtain blocking inode lock on even one child for gfid:24a48cae-53fe-4634-8fb7-0254c85ad672.
[2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not connected)

Volume configuration -

Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: dc-vihi44:/gluster/bricks/megabrick/data
Brick2: dc-vihi45:/gluster/bricks/megabrick/data
Brick3: dc-vihi44:/gluster/bricks/brick1/data
Brick4: dc-vihi45:/gluster/bricks/brick1/data
Brick5: dc-vihi44:/gluster/bricks/brick2_1/data
Brick6: dc-vihi45:/gluster/bricks/brick2/data
Brick7: dc-vihi44:/gluster/bricks/brick3/data
Brick8: dc-vihi45:/gluster/bricks/brick3/data
Brick9: dc-vihi44:/gluster/bricks/brick4/data
Brick10: dc-vihi45:/gluster/bricks/brick4/data
Options Reconfigured:
cluster.min-free-inodes: 6%
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
user.cifs: off
cluster.choose-local: off
features.shard: on
cluster.server-quorum-ratio: 51%

-Walter Deignan
-Uline IT, Systems Architect
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Claus Jeppesen
Manager, Network Services
Datto, Inc.
p +45 6170 5901 | Copenhagen Office
www.datto.com

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux