Dear Users,
Somehow the brick processes seem to crash on xfs filesystem error's. It seems it depends on the way the gluster process is started. Also gluster sends on this occurrence a message to the console, informing the process will go down, however it doesn't really seem to go down;
M [MSGID: 113075] [posix-helpers.c:2185:posix_health_check_thread_proc] 0-ovirt-engine-posix: health-check failed, going down
M [MSGID: 113075] [posix-helpers.c:2203:posix_health_check_thread_proc] 0-ovirt-engine-posix: still alive! -> SIGTERM
in the brick log a message like this is logged;
[posix-helpers.c:2111:posix_fs_health_check] 0-ovirt-data-posix: aio_read_cmp_buf() on /data5/gfs/bricks/brick1/ovirt-data/.glusterfs/health_check returned ret is -1 error is Structure needs cleaning
or like this;
W [MSGID: 113075] [posix-helpers.c:2111:posix_fs_health_check] 0-ovirt-mon-2-posix: aio_read_buf() on /data0/gfs/bricks/bricka/ovirt-mon-2/.glusterfs/health_check returned ret is -1 error is Success
when i check the actual file it just seems to contain a timestamp;
cat /data0/gfs/bricks/bricka/ovirt-mon-2/.glusterfs/health_check
2021-01-28 09:08:01⏎
2021-01-28 09:08:01⏎
And don't see errors in DMESG about having issues accessing it.
When i unmount the filesystem and run xfs_repair on it, no error's/issues are reported. Also when i mount the filesystem again, it's reported as a clean mount;
[2478552.169540] XFS (dm-23): Mounting V5 Filesystem
[2478552.180645] XFS (dm-23): Ending clean mount
[2478552.180645] XFS (dm-23): Ending clean mount
When i kill the brick process and start with "gluser v start x force" the issue seems much more unlikely to occur, but when started from a fresh reboot, or when killing the process and let it being started by glusterd (e.g. service glusterd start) the error seems to arise after a couple of minutes.
I am making use of LVM cache (in write through mode), maybe that's related. Also the disks it self are backed by a hardware raid controller and i did inspect all disks for SMART errors.
Does anybody has experience with this, and a clue on what might causing this?
Thanks Olaf
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users