Re: VM going down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 08/05/2017 15:49, Alessandro Briosi ha scritto:
Il 08/05/2017 12:57, Jesper Led Lauridsen TS Infra server ha scritto:

I dont know if this has any relation to you issue. But I have seen several times during gluster healing that my wm’s fail or are marked unresponsive in rhev. My conclusion is that the load gluster puts on the wm-images during checksum while healing, result in to much latency and wm’s fail.

 

My plans is to try using sharding, so the wm-images/files are split into smaller files, changing the number of allowed concurrent heals ‘cluster.background-self-heal-count’ and disabling ‘cluster.self-heal-daemon’.


The thing is that there are no heal processes running, no log entries either.
Few days ago I had a failure and the heal process started and finished without any problems.

I do not use sharding yet.

Well, it happened again on a different volume and a different VM.

This time a self heal process was started.

Why is this happening? there are no network problems on the hosts and they all do have bonded 2x1Gbit nics dedicated to gluster...

Is there any information I can give you to find out what happened?

This is the only mention about heal in the logs:
[2017-05-08 17:34:40.474774] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-datastore1-replicate-0: Completed data selfheal on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f. sources=[1]  sinks=0 2

The VM went down 1 1/2 hour before:
[2017-05-08 15:54:11.781749] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting connection from srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781749] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-datastore1-server: disconnecting connection from srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0
[2017-05-08 15:54:11.781840] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on /images/101/vm-101-disk-2.qcow2
[2017-05-08 15:54:11.781838] W [inodelk.c:399:pl_inodelk_log_cleanup] 0-datastore1-server: releasing lock on bc8f6a7e-31e5-4b48-946c-f779a4b2e64f held by {client=0x7ffa7c0051f0, pid=0 lk-owner=5c600023827f0
000}
[2017-05-08 15:54:11.781863] I [MSGID: 115013] [server-helpers.c:293:do_fd_cleanup] 0-datastore1-server: fd cleanup on /images/101/vm-101-disk-1.qcow2
[2017-05-08 15:54:11.781947] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down connection srvpve1-59243-2017/05/04-00:51:23:914158-datastore1-client-0-0-0
[2017-05-08 15:54:11.781971] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-datastore1-server: Shutting down connection srvpve1-59243-2017/05/04-00:51:23:790337-datastore1-client-0-0-0


Any hint would be greatly apreciated.

Alessandro

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux