Re: Gluster 11.0 upgrade

Marcus Pedersén <marcus.pedersen@xxxxxx> · Mon, 20 Feb 2023 13:42:47 +0100

I made a recusive diff on the upgraded arbiter.

/var/lib/glusterd/vols/gds-common is the upgraded aribiter
/home/marcus/gds-common is one of the other nodes still on gluster 10

diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
5c5
< listen-port=60419
---
> listen-port=0
11c11
< brick-fsid=14764358630653534655
---
> brick-fsid=0
diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
5c5
< listen-port=0
---
> listen-port=60891
11c11
< brick-fsid=0
---
> brick-fsid=1088380223149770683
diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum
1c1
< info=3948700922
---
> info=458813151
diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
3c3
<     option shared-brick-count 1
---
>     option shared-brick-count 0
diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
3c3
<     option shared-brick-count 0
---
>     option shared-brick-count 1
diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
23a24
> nfs.disable=on

I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
After that I upgraded to 11 and the first 2 nodes was fine but on the third
node I got the same behaviour: the brick never started.

Thanks for the help!

Regards
Marcus

On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi Marcus,
>
> On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>> wrote:
> Hi Xavi,
> I stopped glusterd and killall glusterd glusterfs glusterfsd
> and started glusterd again.
>
> The only log that is not empty is glusterd.log, I attach the log
> from the restart time. The brick log, glustershd.log and glfsheal-gds-common.log is empty.
>
> This are the errors in the log:
> [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031
> [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032
>
> Geo replication is not setup so I guess there is nothing strange that there is an error regarding georep.
> The checksum error seems natural to be there as the other nodes are still on version 10.
>
> No. The configurations should be identical.
>
> Can you try to compare volume definitions in /var/lib/glusterd/vols/gds-common between the upgraded server and one of the old ones ?
>
> Regards,
>
> Xavi
>
>
> My previous exprience with upgrades is that the local bricks starts and
> gluster is up and running. No connection with the other nodes until they are upgraded as well.
>
>
> gluster peer status, gives the output:
> Number of Peers: 2
>
> Hostname: urd-gds-032
> Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> State: Peer Rejected (Connected)
>
> Hostname: urd-gds-031
> Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> State: Peer Rejected (Connected)
>
> I suppose and guess that this is due to that the arbiter is version 11
> and the other 2 nodes are version 10.
>
> Please let me know if I can provide any other information
> to try to solve this issue.
>
> Many thanks!
> Marcus
>
>
> On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Hi Marcus,
> >
> > these errors shouldn't prevent the bricks from starting. Isn't there any other error or warning ?
> >
> > Regards,
> >
> > Xavi
> >
> > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx><mailto:marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>>> wrote:
> > Hi all,
> > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> > OS: Debian bullseye
> >
> > Volume Name: gds-common
> > Type: Replicate
> > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: urd-gds-031:/urd-gds/gds-common
> > Brick2: urd-gds-032:/urd-gds/gds-common
> > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > Options Reconfigured:
> > cluster.granular-entry-heal: on
> > storage.fips-mode-rchecksum: on
> > transport.address-family: inet
> > nfs.disable: on
> > performance.client-io-threads: off
> >
> > I started with the arbiter node, stopped all of gluster
> > upgraded to 11.0 and all went fine.
> > After upgrade I was able to see the other nodes and
> > all nodes were connected.
> > After a reboot on the arbiter nothing works the way it should.
> > Both brick1 and brick2 has connection but no connection
> > with the arbiter.
> > On the arbiter glusterd has started and is listening on port 24007,
> > the problem seems to be glusterfsd, it never starts!
> >
> > If I run: gluster volume status
> >
> > Status of volume: gds-common
> > Gluster process                             TCP Port  RDMA Port  Online  Pid
> > ------------------------------------------------------------------------------
> > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A        N       N/A
> > Self-heal Daemon on localhost               N/A       N/A        N       N/A
> >
> > Task Status of Volume gds-common
> > ------------------------------------------------------------------------------
> > There are no active volume tasks
> >
> >
> > In glusterd.log I find the following errors (arbiter node):
> > [2023-02-17 12:30:40.519585 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> >
> > In brick/urd-gds-gds-common.log I find the following error:
> > [2023-02-17 12:30:43.550753 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> >
> > I enclose both logfiles.
> >
> > How do I resolve this issue??
> >
> > Many thanks in advance!!
> >
> > Marcus
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx><mailto:Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users