Hi Jiri,
your probleem looks pretty similar to mine, see; https://lists.gluster.org/pipermail/gluster-users/2021-February/039134.html
Any chance you also see the xfs errors in de brick logs?
For me the situation improved once i disabled brick multiplexing, but i don't see that in your volume configuration.
Cheers Olaf
Op do 8 jul. 2021 om 12:28 schreef Jiří Sléžka <jiri.slezka@xxxxxx>:
Hello gluster community,
I am new to this list but using glusterfs for log time as our SDS
solution for storing 80+TiB of data. I'm also using glusterfs for small
3 node HCI cluster with oVirt 4.4.6 and CentOS 8 (not stream yet).
Glusterfs version here is 8.5-2.el8.x86_64.
For time to time (I belive) random brick on random host goes down
because health-check. It looks like
[root@ovirt-hci02 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
07:13:37.408184] M [MSGID: 113075]
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
health-check failed, going down
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
07:13:37.408407] M [MSGID: 113075]
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
alive! -> SIGTERM
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
16:11:14.518971] M [MSGID: 113075]
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
health-check failed, going down
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-07
16:11:14.519200] M [MSGID: 113075]
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
alive! -> SIGTERM
on other host
[root@ovirt-hci01 ~]# grep "posix_health_check" /var/log/glusterfs/bricks/*
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05
13:15:51.983327] M [MSGID: 113075]
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-engine-posix:
health-check failed, going down
/var/log/glusterfs/bricks/gluster_bricks-engine-engine.log:[2021-07-05
13:15:51.983728] M [MSGID: 113075]
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-engine-posix:
still alive! -> SIGTERM
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05
01:53:35.769129] M [MSGID: 113075]
[posix-helpers.c:2214:posix_health_check_thread_proc] 0-vms-posix:
health-check failed, going down
/var/log/glusterfs/bricks/gluster_bricks-vms2-vms2.log:[2021-07-05
01:53:35.769819] M [MSGID: 113075]
[posix-helpers.c:2232:posix_health_check_thread_proc] 0-vms-posix: still
alive! -> SIGTERM
I cannot link these errors to any storage/fs issue (in dmesg or
/var/log/messages), brick devices looks healthy (smartd).
I can force start brick with
gluster volume start vms|engine force
and after some healing all works fine for few days
Did anybody observe this behavior?
vms volume has this structure (two bricks per host, each is separate
JBOD ssd disk), engine volume has one brick on each host...
gluster volume info vms
Volume Name: vms
Type: Distributed-Replicate
Volume ID: 52032ec6-99d4-4210-8fb8-ffbd7a1e0bf7
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.0.4.11:/gluster_bricks/vms/vms
Brick2: 10.0.4.13:/gluster_bricks/vms/vms
Brick3: 10.0.4.12:/gluster_bricks/vms/vms
Brick4: 10.0.4.11:/gluster_bricks/vms2/vms2
Brick5: 10.0.4.13:/gluster_bricks/vms2/vms2
Brick6: 10.0.4.12:/gluster_bricks/vms2/vms2
Options Reconfigured:
cluster.granular-entry-heal: enable
performance.stat-prefetch: off
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
user.cifs: off
network.ping-timeout: 30
network.remote-dio: off
performance.strict-o-direct: on
performance.low-prio-threads: 32
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
Cheers,
Jiri
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users