How to prevent Brick terminated by socket temporarily unavailable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm having a frequent problem where some temporary condition causes bricks to be shut down. The health-check feature is shutting them down, and according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/ the brick will stay off and not be restarted (by design).

 

What I don't understand is:

  • What is causing this "Resource temporarily unavailable" in the first place. From searching the web, it sounds like a socket timeout. Have you guys seen this before?
  • If this is truly a temporary failure, why do we shut down the brick indefinitely?

 

Should I try any of the following:

  • Increase 'network.ping-timeout' or 'client.grace-timeout'
  • Disable the health check feature by setting:

# gluster volume set <VOLNAME> storage.health-check-interval 0

 

The brick log looks like this at the time it is shut down:

------------------

[2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable]

[2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down

[2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM

[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down

------------------

 

The GlusterD log shows this shortly after:

 

------------------

[2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)

------------------

 

Any guidance would be greatly appreciated!

 

Best,

 

Jeff Bischoff

 

 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux