Re: Gluster 11.0 upgrade

Marcus Pedersén <marcus.pedersen@xxxxxx> · Mon, 20 Feb 2023 09:04:13 +0100

Failed to send a copy to the list:

Hi Xavi,
I stopped glusterd and killall glusterd glusterfs glusterfsd
and started glusterd again.

The only log that is not empty is glusterd.log, I attach the log
from the restart time. The brick log, glustershd.log and glfsheal-gds-common.log is empty.

This are the errors in the log:
[2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
[2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031
[2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032

Geo replication is not setup so I guess there is nothing strange that there is an error regarding georep.
The checksum error seems natural to be there as the other nodes are still on version 10.

My previous exprience with upgrades is that the local bricks starts and
gluster is up and running. No connection with the other nodes until they are upgraded as well.

gluster peer status, gives the output:
Number of Peers: 2

Hostname: urd-gds-032
Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
State: Peer Rejected (Connected)

Hostname: urd-gds-031
Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
State: Peer Rejected (Connected)

I suppose and guess that this is due to that the arbiter is version 11
and the other 2 nodes are version 10.

Please let me know if I can provide any other information
to try to solve this issue.

Many thanks!
Marcus

On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> Hi Marcus,
>
> these errors shouldn't prevent the bricks from starting. Isn't there any other error or warning ?
>
> Regards,
>
> Xavi
>
> On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén <marcus.pedersen@xxxxxx<mailto:marcus.pedersen@xxxxxx>> wrote:
> Hi all,
> I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> OS: Debian bullseye
>
> Volume Name: gds-common
> Type: Replicate
> Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: urd-gds-031:/urd-gds/gds-common
> Brick2: urd-gds-032:/urd-gds/gds-common
> Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> Options Reconfigured:
> cluster.granular-entry-heal: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> I started with the arbiter node, stopped all of gluster
> upgraded to 11.0 and all went fine.
> After upgrade I was able to see the other nodes and
> all nodes were connected.
> After a reboot on the arbiter nothing works the way it should.
> Both brick1 and brick2 has connection but no connection
> with the arbiter.
> On the arbiter glusterd has started and is listening on port 24007,
> the problem seems to be glusterfsd, it never starts!
>
> If I run: gluster volume status
>
> Status of volume: gds-common
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A        N       N/A
> Self-heal Daemon on localhost               N/A       N/A        N       N/A
>
> Task Status of Volume gds-common
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> In glusterd.log I find the following errors (arbiter node):
> [2023-02-17 12:30:40.519585 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
>
> In brick/urd-gds-gds-common.log I find the following error:
> [2023-02-17 12:30:43.550753 +0000] E [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
>
> I enclose both logfiles.
>
> How do I resolve this issue??
>
> Many thanks in advance!!
>
> Marcus
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>
> https://lists.gluster.org/mailman/listinfo/gluster-users

[2023-02-20 07:23:22.343689 +0000] W [glusterfsd.c:1427:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7fe540933ea7] -->/usr/sbin/glusterd(+0x125f5) [0x562911f155f5] -->/usr/sbin/glusterd(cleanup_and_exit+0x57) [0x562911f0dd77] ) 0-: received signum (15), shutting down
[2023-02-20 07:23:46.161159 +0000] I [MSGID: 100030] [glusterfsd.c:2872:main] 0-/usr/sbin/glusterd: Started running version [{arg=/usr/sbin/glusterd}, {version=11.0}, {cmdlinestr=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO}]
[2023-02-20 07:23:46.161529 +0000] I [glusterfsd.c:2562:daemonize] 0-glusterfs: Pid of current running process is 291401
[2023-02-20 07:23:46.163500 +0000] I [MSGID: 0] [glusterfsd.c:1597:volfile_init] 0-glusterfsd-mgmt: volume not found, continuing with init
[2023-02-20 07:23:46.186377 +0000] I [MSGID: 106479] [glusterd.c:1660:init] 0-management: Using /var/lib/glusterd as working directory
[2023-02-20 07:23:46.186419 +0000] I [MSGID: 106479] [glusterd.c:1664:init] 0-management: Using /var/run/gluster as pid file working directory
[2023-02-20 07:23:46.191506 +0000] I [socket.c:973:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2023-02-20 07:23:46.192171 +0000] I [socket.c:916:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 13
[2023-02-20 07:23:46.192350 +0000] I [MSGID: 106059] [glusterd.c:1923:init] 0-management: max-port override: 60999
[2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}]
[2023-02-20 07:23:47.237899 +0000] I [MSGID: 106513] [glusterd-store.c:2198:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 100000
[2023-02-20 07:23:47.245819 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: tier-enabled
[2023-02-20 07:23:47.245877 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-0
[2023-02-20 07:23:47.245892 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-1
[2023-02-20 07:23:47.245904 +0000] W [MSGID: 106204] [glusterd-store.c:3273:glusterd_store_update_volinfo] 0-management: Unknown key: brick-2
[2023-02-20 07:23:47.250826 +0000] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 7290862d-4e05-4ff7-ae4d-5f36b1c933bc
[2023-02-20 07:23:47.286862 +0000] I [MSGID: 106498] [glusterd-handler.c:3794:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-02-20 07:23:47.291579 +0000] I [MSGID: 106498] [glusterd-handler.c:3794:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-02-20 07:23:47.291640 +0000] W [MSGID: 106061] [glusterd-handler.c:3589:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-02-20 07:23:47.291704 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2023-02-20 07:23:47.293158 +0000] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option max-port 60999
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.listen-port 24007
 12:     option transport.socket.read-fail-log off
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-time 10
 15:     option transport-type socket
 16:     option working-directory /var/lib/glusterd
 17: end-volume
 18:
+------------------------------------------------------------------------------+
[2023-02-20 07:23:47.293147 +0000] W [MSGID: 106061] [glusterd-handler.c:3589:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-02-20 07:23:47.297871 +0000] I [MSGID: 101188] [event-epoll.c:643:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
[2023-02-20 07:23:47.299977 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000
[2023-02-20 07:23:47.350835 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000
[2023-02-20 07:23:47.359667 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
[2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-031
[2023-02-20 07:23:47.360263 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to urd-gds-031 (0), ret: 0, op_ret: -1
[2023-02-20 07:23:47.377631 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439, host: urd-gds-032, port: 0
[2023-02-20 07:23:47.386520 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf, host: urd-gds-031, port: 0
[2023-02-20 07:23:47.437850 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
[2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum = 2065453698 on peer urd-gds-032
[2023-02-20 07:23:47.438328 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to urd-gds-032 (0), ret: 0, op_ret: -1
[2023-02-20 07:23:57.076740 +0000] I [MSGID: 106061] [glusterd-utils.c:9577:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}]
[2023-02-20 07:23:57.076978 +0000] I [MSGID: 106499] [glusterd-handler.c:4535:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gds-common
[2023-02-20 07:25:22.608430 +0000] I [MSGID: 106487] [glusterd-handler.c:1452:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2023-02-20 07:26:39.156882 +0000] I [MSGID: 106061] [glusterd-utils.c:9577:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}]
[2023-02-20 07:26:39.157119 +0000] I [MSGID: 106499] [glusterd-handler.c:4535:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gds-common
[2023-02-20 07:27:22.923216 +0000] I [MSGID: 106487] [glusterd-handler.c:1452:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users