Re: KVM lockups on Gluster 4.1.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ran into this issue too with 4.1.5 with an arbiter setup.  Also could not run a statedump due to "Segmentation fault".

Tried with 3.12.13 and had issues with locked files as well.  We were able to do a statedump and found that some of our files were "BLOCKED" (xlator.features.locks.vol-locks.inode).  Attached part of statedump.

Also tried clearing the locks using clear-locks, which did remove the lock, but as soon as I tried to cat the file, it got locked again and the cat process hung.

On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov <dm@xxxxxxxxxx> wrote:
28.08.2018 10:43, Amar Tumballi пишет:


On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov <dm@xxxxxxxxxx> wrote:
Hello!


Yesterday we hit something like this on 4.1.2

Centos 7.5.


Volume is replicated - two bricks and one arbiter.


We rebooted arbiter, waited for heal end,  and tried to live migrate VM to another node ( we run VMs on gluster nodes ):


[2018-08-27 09:56:22.085411] I [MSGID: 115029] [server-handshake.c:763:server_setvolume] 0-pool-server: accepted client from CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0 (version: 4.1.2)
[2018-08-27 09:56:22.107609] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection from CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0
[2018-08-27 09:56:22.107747] I [MSGID: 101055] [client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
t-6-RECON_NO:-0
[2018-08-27 09:58:37.905829] I [MSGID: 115036] [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection from CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
ool-client-6-RECON_NO:-0
[2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc550000}
[2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc550000}
[2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc550000}
[2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc550000}
[2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc550000}
[2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc550000}
[2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc550000}
[2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc550000}
[2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc550000}
[2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc550000}
...

[2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc550000}
[2018-08-27 09:58:37.907393] W [inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880cdd6bc550000}
[2018-08-27 09:58:37.907476] I [socket.c:3837:socket_submit_reply] 0-tcp.pool-server: not connected (priv->connected = -1)
[2018-08-27 09:58:37.907520] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xcb88cb, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
[2018-08-27 09:58:37.910727] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910814] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xcb88ce, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
[2018-08-27 09:58:37.910861] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910904] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xcb88cf, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
[2018-08-27 09:58:37.910940] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.910979] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xcb88d1, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
[2018-08-27 09:58:37.911012] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.911050] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xcb88d8, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 30) to rpc-transport (tcp.pool-server)
[2018-08-27 09:58:37.911083] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916217] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x20084) [0x7ffb64379084] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x605
ba) [0x7ffb5fddf5ba] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7ffb5fd89fce] ) 0-: Reply submission failed
[2018-08-27 09:58:37.916520] I [MSGID: 115013] [server-helpers.c:286:do_fd_cleanup] 0-pool-server: fd cleanup on /balamak.img


after this I/O on  /balamak.img was blocked.


Only solution we found was to reboot all 3 nodes.


Is there any bug report in bugzilla we can add logs?


Not aware of such bugs!
 
Is it possible to turn of these locks?


Not sure, will get back on this one!


btw, found this link
https://docs.gluster.org/en/v3/Troubleshooting/troubleshooting-filelocks/

tried on another (test) cluster:

 [root@marduk ~]# gluster volume statedump pool
Segmentation fault (core dumped)
 

4.1.2 too...

something is wrong here.


 
Thank you!




_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



--
Amar Tumballi (amarts)


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
[xlator.features.locks.testvol-locks.inode]
path=/_admin/shared/testvol.keystore.lock
mandatory=0
inodelk-count=2
lock-dump.domain.domain=testvol-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=700a00f0d07f0000, client=0x7fb1000b1190, connection-id=eng-test-hot-cold-test-standby.saas.testvolcloud.com-3852-2018/09/18-18:43:32:168282-testvol-client-0-0-0, granted at 2018-09-18 18:44:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 2114, owner=d0c000004b7f0000, client=0x7fb1000b5320, connection-id=eng-test-hot-cold-test.saas.testvolcloud.com-4423-2018/09/18-18:43:33:251627-testvol-client-0-0-0, blocked at 2018-09-18 18:53:04
lock-dump.domain.domain=testvol-replicate-0:self-heal
lock-dump.domain.domain=testvol-replicate-0:metadata
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux