Re: Gluster 11.0 upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,
I went ahead and upgraded the last two nodes in the cluster.
This is what I noted:
I upgraded the arbiter first and in:
/var/lib/glusterd/vols/gds-common/info
The parameter "nfs.disable=on" was added by the upgrade and
made the checksum fail.
I removed "nfs.disable=on" and all the three nodes connected fine.
I upgraded one of the other nodes and no changes were made to
the /var/lib/glusterd/vols/gds-common/info file, so the arbiter
node and the resent upgraded node had contact.
I upgraded the last node and on this node the parameter "nfs.disable=on"
was added in file: /var/lib/glusterd/vols/gds-common/info
I removed "nfs.disable=on" and restarted glusterd and the entire cluster
is up and ruunning the way it should.

The command: gluster volume get all cluster.max-op-version
Still says:

Option                                   Value
------                                   -----
cluster.max-op-version                   100000

I hope that this info helps!
Please let me know if I can help out in any other way!

Regards
Marcus


On Tue, Feb 21, 2023 at 01:19:58PM +0100, Marcus Pedersén wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi Xavi,
> Copy the same info file worked well and the gluster 11 arbiter
> is now up and running and all the nodes are communication
> the way they should.
>
> Just another note on something I discovered on my virt machines.
> All the three nodes has been upgarded to 11.0 and are working.
> If I run:
> gluster volume get all cluster.op-version
> I get:
> Option                                   Value
> ------                                   -----
> cluster.op-version                       100000
>
> Which is correct as I have not updated the op-version,
> but if I run:
> gluster volume get all cluster.max-op-version
> I get:
> Option                                   Value
> ------                                   -----
> cluster.max-op-version                   100000
>
> I expected the max-op-version to be 110000.
> Isn't it supposed to be 110000?
> And after upgrade you should upgrade the op-version
> to 110000?
>
> Many thanks for all your help!
> Regards
> Marcus
>
>
> On Tue, Feb 21, 2023 at 09:29:28AM +0100, Xavi Hernandez wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Hi Marcus,
> >
> > On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>> wrote:
> > Hi again Xavi,
> >
> > I did some more testing on my virt machines
> > with same setup:
> > Number of Bricks: 1 x (2 + 1) = 3
> > If I do it the same way, I upgrade the arbiter first,
> > I get the same behavior that the bricks do not start
> > and the other nodes does not "see" the upgraded node.
> > If I upgrade one of the other nodes (non arbiter) and restart
> > glusterd on both the arbiter and the other the arbiter starts
> > the bricks and connects with the other upgraded node as expected.
> > If I upgrade the last node (non arbiter) it will fail to start
> > the bricks, same behaviour as the arbiter at first.
> > If I then copy the /var/lib/gluster/vols/<myvol> from the
> > upgraded (non arbiter) node to the other node that does not start the bricks
> > and replace /var/lib/gluster/vols/<myvol> with the copied directory
> > and restarts glusterd it works nicely after that.
> > Everything then works the way it should.
> >
> > So the question is if the arbiter is treated in some other way
> > compared to the other nodes?
> >
> > It seems so, but at this point I'm not sure what could be the difference.
> >
> >
> > Some type of config is happening at the start of the glusterd that
> > makes the node fail?
> >
> > Gluster requires that all glusterd share the same configuration. In this case it seems that the "info" file in the volume definition has different contents on the servers.  One of the servers has the value "nfs.disable=on" but the others do not. This can be the difference that causes the checksum error.
> >
> > You can try to copy the "info" file from one node to the one that doesn't start and try restarting glusterd.
> >
> >
> > Do I dare to continue to upgrade my real cluster with the above described way?
> >
> > Thanks!
> >
> > Regards
> > Marcus
> >
> >
> >
> > On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> > > I made a recusive diff on the upgraded arbiter.
> > >
> > > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > > /home/marcus/gds-common is one of the other nodes still on gluster 10
> > >
> > > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > > 5c5
> > > < listen-port=60419
> > > ---
> > > > listen-port=0
> > > 11c11
> > > < brick-fsid=14764358630653534655
> > > ---
> > > > brick-fsid=0
> > > diff -r /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > > 5c5
> > > < listen-port=0
> > > ---
> > > > listen-port=60891
> > > 11c11
> > > < brick-fsid=0
> > > ---
> > > > brick-fsid=1088380223149770683
> > > diff -r /var/lib/glusterd/vols/gds-common/cksum /home/marcus/gds-common/cksum
> > > 1c1
> > > < info=3948700922
> > > ---
> > > > info=458813151
> > > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > > 3c3
> > > <     option shared-brick-count 1
> > > ---
> > > >     option shared-brick-count 0
> > > diff -r /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > > 3c3
> > > <     option shared-brick-count 0
> > > ---
> > > >     option shared-brick-count 1
> > > diff -r /var/lib/glusterd/vols/gds-common/info /home/marcus/gds-common/info
> > > 23a24
> > > > nfs.disable=on
> > >
> > >
> > > I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> > > After that I upgraded to 11 and the first 2 nodes was fine but on the third
> > > node I got the same behaviour: the brick never started.
> > >
> > > Thanks for the help!
> > >
> > > Regards
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx><mailto:marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>>> wrote:
> > > > Hi Xavi,
> > > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > > and started glusterd again.
> > > >
> > > > The only log that is not empty is glusterd.log, I attach the log
> > > > from the restart time. The brick log, glustershd.log and glfsheal-gds-common.log is empty.
> > > >
> > > > This are the errors in the log:
> > > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031
> > > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032
> > > >
> > > > Geo replication is not setup so I guess there is nothing strange that there is an error regarding georep.
> > > > The checksum error seems natural to be there as the other nodes are still on version 10.
> > > >
> > > > No. The configurations should be identical.
> > > >
> > > > Can you try to compare volume definitions in /var/lib/glusterd/vols/gds-common between the upgraded server and one of the old ones ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > >
> > > > My previous exprience with upgrades is that the local bricks starts and
> > > > gluster is up and running. No connection with the other nodes until they are upgraded as well.
> > > >
> > > >
> > > > gluster peer status, gives the output:
> > > > Number of Peers: 2
> > > >
> > > > Hostname: urd-gds-032
> > > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > > State: Peer Rejected (Connected)
> > > >
> > > > Hostname: urd-gds-031
> > > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > > State: Peer Rejected (Connected)
> > > >
> > > > I suppose and guess that this is due to that the arbiter is version 11
> > > > and the other 2 nodes are version 10.
> > > >
> > > > Please let me know if I can provide any other information
> > > > to try to solve this issue.
> > > >
> > > > Many thanks!
> > > > Marcus
> > > >
> > > >
> > > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > > >
> > > > >
> > > > > Hi Marcus,
> > > > >
> > > > > these errors shouldn't prevent the bricks from starting. Isn't there any other error or warning ?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Xavi
> > > > >
> > > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx><mailto:marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>><mailto:marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx><mailto:marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>>>> wrote:
> > > > > Hi all,
> > > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> > > > > OS: Debian bullseye
> > > > >
> > > > > Volume Name: gds-common
> > > > > Type: Replicate
> > > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > > Status: Started
> > > > > Snapshot Count: 0
> > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > Transport-type: tcp
> > > > > Bricks:
> > > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > > Options Reconfigured:
> > > > > cluster.granular-entry-heal: on
> > > > > storage.fips-mode-rchecksum: on
> > > > > transport.address-family: inet
> > > > > nfs.disable: on
> > > > > performance.client-io-threads: off
> > > > >
> > > > > I started with the arbiter node, stopped all of gluster
> > > > > upgraded to 11.0 and all went fine.
> > > > > After upgrade I was able to see the other nodes and
> > > > > all nodes were connected.
> > > > > After a reboot on the arbiter nothing works the way it should.
> > > > > Both brick1 and brick2 has connection but no connection
> > > > > with the arbiter.
> > > > > On the arbiter glusterd has started and is listening on port 24007,
> > > > > the problem seems to be glusterfsd, it never starts!
> > > > >
> > > > > If I run: gluster volume status
> > > > >
> > > > > Status of volume: gds-common
> > > > > Gluster process                             TCP Port  RDMA Port  Online  Pid
> > > > > ------------------------------------------------------------------------------
> > > > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A        N       N/A
> > > > > Self-heal Daemon on localhost               N/A       N/A        N       N/A
> > > > >
> > > > > Task Status of Volume gds-common
> > > > > ------------------------------------------------------------------------------
> > > > > There are no active volume tasks
> > > > >
> > > > >
> > > > > In glusterd.log I find the following errors (arbiter node):
> > > > > [2023-02-17 12:30:40.519585 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > > >
> > > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > > [2023-02-17 12:30:43.550753 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > > >
> > > > > I enclose both logfiles.
> > > > >
> > > > > How do I resolve this issue??
> > > > >
> > > > > Many thanks in advance!!
> > > > >
> > > > > Marcus
> > > > > ---
> > > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > > ________
> > > > >
> > > > >
> > > > >
> > > > > Community Meeting Calendar:
> > > > >
> > > > > Schedule -
> > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > > Gluster-users mailing list
> > > > > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx><mailto:Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>><mailto:Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx><mailto:Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>>>
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > ---
> > > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > ---
> > När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> >
>
> --
> **************************************************
> * Marcus Pedersén                                *
> * System administrator                           *
> **************************************************
> * Interbull Centre                               *
> * ================                               *
> * Department of Animal Breeding & Genetics — SLU *
> * Box 7023, SE-750 07                            *
> * Uppsala, Sweden                                *
> **************************************************
> * Visiting address:                              *
> * Room 55614, Ulls väg 26, Ultuna                *
> * Uppsala                                        *
> * Sweden                                         *
> *                                                *
> * Tel: +46-(0)18-67 1962                         *
> *                                                *
> **************************************************
> *     ISO 9001 Bureau Veritas No SE004561-1      *
> **************************************************
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> https://lists.gluster.org/mailman/listinfo/gluster-users
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux