Re: Upgrade 10.4 -> 11.1 making problems

Hu Bert <revirii@xxxxxxxxxxxxxx> · Wed, 17 Jan 2024 13:24:53 +0100

hm, i only see such messages in glustershd.log on the 2 good servers:

[2024-01-17 12:18:48.912952 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6:
remote operation failed.
[{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>},
{gfid=ee28b56c-e352-48f8-bbb5-dbf31
babe073}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:18:48.913015 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-7:
remote operation failed.
[{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>},
{gfid=ee28b56c-e352-48f8-bbb5-dbf31
babe073}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:19:09.450335 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-10:
remote operation failed.
[{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>},
{gfid=ea4a63e3-1470-40a5-8a7e-2a10
61a8fcb0}, {errno=2}, {error=No such file or directory}]
[2024-01-17 12:19:09.450771 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-9:
remote operation failed.
[{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>},
{gfid=ea4a63e3-1470-40a5-8a7e-2a106
1a8fcb0}, {errno=2}, {error=No such file or directory}]

not sure if this is important.

Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:
>
> ok, finally managed to get all servers, volumes etc runnung, but took
> a couple of restarts, cksum checks etc.
>
> One problem: a volume doesn't heal automatically or doesn't heal at all.
>
> gluster volume status
> Status of volume: workdata
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick glusterpub1:/gluster/md3/workdata     58832     0          Y       3436
> Brick glusterpub2:/gluster/md3/workdata     59315     0          Y       1526
> Brick glusterpub3:/gluster/md3/workdata     56917     0          Y       1952
> Brick glusterpub1:/gluster/md4/workdata     59688     0          Y       3755
> Brick glusterpub2:/gluster/md4/workdata     60271     0          Y       2271
> Brick glusterpub3:/gluster/md4/workdata     49461     0          Y       2399
> Brick glusterpub1:/gluster/md5/workdata     54651     0          Y       4208
> Brick glusterpub2:/gluster/md5/workdata     49685     0          Y       2751
> Brick glusterpub3:/gluster/md5/workdata     59202     0          Y       2803
> Brick glusterpub1:/gluster/md6/workdata     55829     0          Y       4583
> Brick glusterpub2:/gluster/md6/workdata     50455     0          Y       3296
> Brick glusterpub3:/gluster/md6/workdata     50262     0          Y       3237
> Brick glusterpub1:/gluster/md7/workdata     52238     0          Y       5014
> Brick glusterpub2:/gluster/md7/workdata     52474     0          Y       3673
> Brick glusterpub3:/gluster/md7/workdata     57966     0          Y       3653
> Self-heal Daemon on localhost               N/A       N/A        Y       4141
> Self-heal Daemon on glusterpub1             N/A       N/A        Y       5570
> Self-heal Daemon on glusterpub2             N/A       N/A        Y       4139
>
> "gluster volume heal workdata info" lists a lot of files per brick.
> "gluster volume heal workdata statistics heal-count" shows thousands
> of files per brick.
> "gluster volume heal workdata enable" has no effect.
>
> gluster volume heal workdata full
> Launching heal operation to perform full self heal on volume workdata
> has been successful
> Use heal info commands to check status.
>
> -> not doing anything at all. And nothing happening on the 2 "good"
> servers in e.g. glustershd.log. Heal was working as expected on
> version 10.4, but here... silence. Someone has an idea?
>
>
> Best regards,
> Hubert
>
> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira
> <gilberto.nunes32@xxxxxxxxx>:
> >
> > Ah! Indeed! You need to perform an upgrade in the clients as well.
> >
> >
> >
> >
> >
> >
> >
> >
> > Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu:
> >>
> >> morning to those still reading :-)
> >>
> >> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them
> >>
> >> there's a paragraph about "peer rejected" with the same error message,
> >> telling me: "Update the cluster.op-version" - i had only updated the
> >> server nodes, but not the clients. So upgrading the cluster.op-version
> >> wasn't possible at this time. So... upgrading the clients to version
> >> 11.1 and then the op-version should solve the problem?
> >>
> >>
> >> Thx,
> >> Hubert
> >>
> >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:
> >> >
> >> > Hi,
> >> > just upgraded some gluster servers from version 10.4 to version 11.1.
> >> > Debian bullseye & bookworm. When only installing the packages: good,
> >> > servers, volumes etc. work as expected.
> >> >
> >> > But one needs to test if the systems work after a daemon and/or server
> >> > restart. Well, did a reboot, and after that the rebooted/restarted
> >> > system is "out". Log message from working node:
> >> >
> >> > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
> >> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> >> > 0-management: using the op-version 100000
> >> > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
> >> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> >> > 0-glusterd: Received probe from uuid:
> >> > b71401c3-512a-47cb-ac18-473c4ba7776e
> >> > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
> >> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> >> > Version of Cksums sourceimages differ. local cksum = 2204642525,
> >> > remote cksum = 1931483801 on peer gluster190
> >> > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
> >> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> >> > Responded to gluster190 (0), ret: 0, op_ret: -1
> >> > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
> >> > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> >> > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> >> > gluster190, port: 0
> >> >
> >> > peer status from rebooted node:
> >> >
> >> > root@gluster190 ~ # gluster peer status
> >> > Number of Peers: 2
> >> >
> >> > Hostname: gluster189
> >> > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> >> > State: Peer Rejected (Connected)
> >> >
> >> > Hostname: gluster188
> >> > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> >> > State: Peer Rejected (Connected)
> >> >
> >> > So the rebooted gluster190 is not accepted anymore. And thus does not
> >> > appear in "gluster volume status". I then followed this guide:
> >> >
> >> > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
> >> >
> >> > Remove everything under /var/lib/glusterd/ (except glusterd.info) and
> >> > restart glusterd service etc. Data get copied from other nodes,
> >> > 'gluster peer status' is ok again - but the volume info is missing,
> >> > /var/lib/glusterd/vols is empty. When syncing this dir from another
> >> > node, the volume then is available again, heals start etc.
> >> >
> >> > Well, and just to be sure that everything's working as it should,
> >> > rebooted that node again - the rebooted node is kicked out again, and
> >> > you have to restart bringing it back again.
> >> >
> >> > Sry, but did i miss anything? Has someone experienced similar
> >> > problems? I'll probably downgrade to 10.4 again, that version was
> >> > working...
> >> >
> >> >
> >> > Thx,
> >> > Hubert
> >> ________
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://meet.google.com/cpu-eiue-hvk
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users