How to prevent Brick terminated by socket temporarily unavailable

Jeff Bischoff <order@xxxxxxxxx> · Thu, 16 May 2019 16:50:16 -0400

I'm having a frequent problem where some temporary condition causes bricks to be shut down. The health-check feature is shutting them down, and according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/ the brick will stay off and not be restarted (by design).

What I don't understand is:
What is causing this "Resource temporarily unavailable" in the first place. From searching the web, it sounds like a socket timeout. Have you guys seen this before?
If this is truly a temporary failure, why do we shut down the brick indefinitely?

Should I try any of the following:
Increase 'network.ping-timeout' or 'client.grace-timeout'
Disable the health check feature by setting:
 # gluster volume set <VOLNAME> storage.health-check-interval 0

The brick log looks like this at the time it is shut down:
------------------
[2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable]
[2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down
[2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM
[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down
------------------

The GlusterD log shows this shortly after:

------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
------------------

Any guidance would be greatly appreciated!

Best,

Jeff Bischoff

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users