hm, i only see such messages in glustershd.log on the 2 good servers: [2024-01-17 12:18:48.912952 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6: remote operation failed. [{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>}, {gfid=ee28b56c-e352-48f8-bbb5-dbf31 babe073}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:18:48.913015 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-7: remote operation failed. [{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>}, {gfid=ee28b56c-e352-48f8-bbb5-dbf31 babe073}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:19:09.450335 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-10: remote operation failed. [{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>}, {gfid=ea4a63e3-1470-40a5-8a7e-2a10 61a8fcb0}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:19:09.450771 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-9: remote operation failed. [{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>}, {gfid=ea4a63e3-1470-40a5-8a7e-2a106 1a8fcb0}, {errno=2}, {error=No such file or directory}] not sure if this is important. Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>: > > ok, finally managed to get all servers, volumes etc runnung, but took > a couple of restarts, cksum checks etc. > > One problem: a volume doesn't heal automatically or doesn't heal at all. > > gluster volume status > Status of volume: workdata > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 > Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 > Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 > Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 > Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 > Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 > Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 > Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 > Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 > Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 > Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 > Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 > Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 > Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 > Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 > Self-heal Daemon on localhost N/A N/A Y 4141 > Self-heal Daemon on glusterpub1 N/A N/A Y 5570 > Self-heal Daemon on glusterpub2 N/A N/A Y 4139 > > "gluster volume heal workdata info" lists a lot of files per brick. > "gluster volume heal workdata statistics heal-count" shows thousands > of files per brick. > "gluster volume heal workdata enable" has no effect. > > gluster volume heal workdata full > Launching heal operation to perform full self heal on volume workdata > has been successful > Use heal info commands to check status. > > -> not doing anything at all. And nothing happening on the 2 "good" > servers in e.g. glustershd.log. Heal was working as expected on > version 10.4, but here... silence. Someone has an idea? > > > Best regards, > Hubert > > Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira > <gilberto.nunes32@xxxxxxxxx>: > > > > Ah! Indeed! You need to perform an upgrade in the clients as well. > > > > > > > > > > > > > > > > > > Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu: > >> > >> morning to those still reading :-) > >> > >> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them > >> > >> there's a paragraph about "peer rejected" with the same error message, > >> telling me: "Update the cluster.op-version" - i had only updated the > >> server nodes, but not the clients. So upgrading the cluster.op-version > >> wasn't possible at this time. So... upgrading the clients to version > >> 11.1 and then the op-version should solve the problem? > >> > >> > >> Thx, > >> Hubert > >> > >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>: > >> > > >> > Hi, > >> > just upgraded some gluster servers from version 10.4 to version 11.1. > >> > Debian bullseye & bookworm. When only installing the packages: good, > >> > servers, volumes etc. work as expected. > >> > > >> > But one needs to test if the systems work after a daemon and/or server > >> > restart. Well, did a reboot, and after that the rebooted/restarted > >> > system is "out". Log message from working node: > >> > > >> > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] > >> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] > >> > 0-management: using the op-version 100000 > >> > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] > >> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] > >> > 0-glusterd: Received probe from uuid: > >> > b71401c3-512a-47cb-ac18-473c4ba7776e > >> > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] > >> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: > >> > Version of Cksums sourceimages differ. local cksum = 2204642525, > >> > remote cksum = 1931483801 on peer gluster190 > >> > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] > >> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: > >> > Responded to gluster190 (0), ret: 0, op_ret: -1 > >> > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] > >> > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: > >> > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: > >> > gluster190, port: 0 > >> > > >> > peer status from rebooted node: > >> > > >> > root@gluster190 ~ # gluster peer status > >> > Number of Peers: 2 > >> > > >> > Hostname: gluster189 > >> > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 > >> > State: Peer Rejected (Connected) > >> > > >> > Hostname: gluster188 > >> > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d > >> > State: Peer Rejected (Connected) > >> > > >> > So the rebooted gluster190 is not accepted anymore. And thus does not > >> > appear in "gluster volume status". I then followed this guide: > >> > > >> > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > >> > > >> > Remove everything under /var/lib/glusterd/ (except glusterd.info) and > >> > restart glusterd service etc. Data get copied from other nodes, > >> > 'gluster peer status' is ok again - but the volume info is missing, > >> > /var/lib/glusterd/vols is empty. When syncing this dir from another > >> > node, the volume then is available again, heals start etc. > >> > > >> > Well, and just to be sure that everything's working as it should, > >> > rebooted that node again - the rebooted node is kicked out again, and > >> > you have to restart bringing it back again. > >> > > >> > Sry, but did i miss anything? Has someone experienced similar > >> > problems? I'll probably downgrade to 10.4 again, that version was > >> > working... > >> > > >> > > >> > Thx, > >> > Hubert > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://meet.google.com/cpu-eiue-hvk > >> Gluster-users mailing list > >> Gluster-users@xxxxxxxxxxx > >> https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users