Re: Growing cluster: peering worked, staging failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Solving my own issue:

Staging fails due to checksum error.
Checksum error occurs, when you try to upgrade a cluster with option nfs.disable set as this is made optional in gluster 11
You have to upgrade the whole cluster to 11, then peer probe is successful as nfs.disable option is removed during upgrade on all nodes resulting in matching checksums.
Unfortunately this means no online upgrade procedure 😕
Findings, errors etc. documented in this issue: https://github.com/gluster/glusterfs/issues/4386

Cheers,
A.



Am Sonntag, dem 29.09.2024 um 10:47 +0200 schrieb Andreas Schwibbe:
Fellow gluster users,

trying to extend a 3 node cluster that is serving me very reliably for a long time now.
Cluster is serving two volumes:

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (2 + 1) = 18
Transport-type: tcp

Volume Name: gv1
Type: Replicate
Volume ID: 69a12600-6720-4e96-a269-931d72d4953e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp

adding peer to the cluster with
gluster peer probe node4 

works. 
gluster peer status & gluster pool list 

show all peers connected on every node.

However,
when doing
gluster v status

causes:
Staging failed on node4. Error: Volume gv0 does not exist
Staging failed on node4. Error: Volume gv1 does not exist

on node4 I can see that /var/lib/glusterd/vols is empty!

/var/log/glusterfs/glusterd.log shows:
[2024-09-29 07:24:32.815997 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload
[2024-09-29 07:24:32.816025 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload
[2024-09-29 07:24:32.924139 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1093:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname_id missing in payload for gv0
[2024-09-29 07:24:32.924158 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1104:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname missing in payload for gv0
[2024-09-29 07:24:32.924175 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1115:gd_import_volume_snap_details] 0-management: volume1.snap_plugin missing in payload for gv0
[2024-09-29 07:24:32.924365 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload
[2024-09-29 07:24:32.924385 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload
[2024-09-29 07:24:43.516131 +0000] E [MSGID: 106048] [glusterd-op-sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to get volinfo [{Volume=gv0}]
[2024-09-29 07:24:43.516209 +0000] E [MSGID: 106301] [glusterd-op-sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Status', Status : -1
[2024-09-29 07:24:43.524640 +0000] E [MSGID: 106048] [glusterd-op-sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to get volinfo [{Volume=gv1}]
[2024-09-29 07:24:43.524683 +0000] E [MSGID: 106301] [glusterd-op-sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Status', Status : -1

The cluster is running Ubuntu 20.04 9.6-ubuntu1~focal1
The new node is running Ubuntu 24.04 11.1-4ubuntu0.1

I know there is version mismatch, according to General Upgrade procedure and v11 release notes I understood one can upgrade directly from 9 -> 11, which I am trying to do by adding one new node after one another and moving bricks over to new nodes step by step.
Also there is kind of deadlock, too, as launchpad Gluster Ubuntu repo does not have an installation target for v9 on Ubuntu 24.04 and v10 does not run due to python errors when installed via launchpad repo. Checked firewall && apparmor && switches && cabling.

Any ideas, comments towards the problem or approach?

Many thanks.
A.
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux