Can you try setting "transport.address-family: inet" at /etc/glusterfs/glusterd.vol on all nodes ? About the rpms, if they are not yet built - the only other option is to build them from source. I assume , that the second try is on a fresh set of systems without any remnants of old Gluster install. Best Regards, Strahil Nikolov В петък, 23 юли 2021 г., 07:55:01 ч. Гринуич+3, Artem Russakovskii <archon810@xxxxxxxxx> написа: Hi Strahil, I am using repo builds from https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/ (currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them. Perhaps the builds at https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/ are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone know? None of the repos currently have 9.3. And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is there a way to make it stop trying to use IPv6 and only use IPv4? Sincerely, Artem -- Founder, Android Police, APK Mirror, Illogical Robot LLC beerpla.net | @ArtemR On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote: > Did you try with latest 9.X ? Based on the release notes that should be 9.3 . > > Best Regards, > Strahil Nikolov > > >> >> >> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii >> <archon810@xxxxxxxxx> wrote: >> >> >> >> Hi all, >> >> I just filed this ticket https://github.com/gluster/glusterfs/issues/2648, and wanted to bring it to your attention. Any feedback would be appreciated. >> >> Description of problem: >> We have a 4-node replicate cluster running gluster 7.9. I'm currently setting up a new cluster on a new set of machines and went straight for gluster 9.1. >> However, I was unable to probe any servers due to this error: >> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487] [glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req nexus2 24007 >> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075] [common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo [{ret=Name or service not known}] >> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408] [glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known >> [Unknown error -2] >> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128] [glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: nexus2 (24007) >> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061] [glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout >> [2021-07-17 00:31:05.375903 +0000] I [rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 >> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] >> [2021-07-17 00:31:05.377043 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498] [glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0 >> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer <nexus2> (<00000000-0000-0000-0000-000000000000>), in state <Establishing Connection>, has disconnected from glusterd. >> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032] [store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] >> >> I then wiped the /var/lib/glusterd dir to start clean and downgraded to 7.9, then attempted to peer probe again. This time, it worked fine, proving 7.9 is working, same as it is on prod. >> At this point, I made a volume, started it, and played around with testing to my satisfaction. Then I decided to see what would happen if I tried to upgrade this working volume from 7.9 to 9.1. >> The end result is: >> * gluster volume status is only showing the local gluster node and not any of the remote nodes >> * data does seem to replicate, so the connection between the servers is actually established >> * logs are now filled with constantly repeating messages like so: >> [2021-07-22 23:29:31.039004 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:31.039212 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:31.039304 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and [2021-07-22 23:29:31.039302 +0000] >> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] >> [2021-07-22 23:29:34.039441 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:34.039558 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:34.039659 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> [2021-07-22 23:29:37.039741 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:37.039921 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:37.040015 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> >> When I issue a command in cli: >> ==> cli.log <== >> [2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli: Started running gluster with version 9.1 >> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect] 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"** >> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] >> >> **Mandatory info:** **- The output of the `gluster volume info` command**: >> gluster volume info >> >> Volume Name: ap >> Type: Replicate >> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 4 = 4 >> Transport-type: tcp >> Bricks: >> Brick1: nexus2:/mnt/nexus2_block1/ap >> Brick2: forge:/mnt/forge_block1/ap >> Brick3: hive:/mnt/hive_block1/ap >> Brick4: citadel:/mnt/citadel_block1/ap >> Options Reconfigured: >> performance.client-io-threads: on >> nfs.disable: on >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> cluster.self-heal-daemon: enable >> client.event-threads: 4 >> cluster.data-self-heal-algorithm: full >> cluster.lookup-optimize: on >> cluster.quorum-count: 1 >> cluster.quorum-type: fixed >> cluster.readdir-optimize: on >> cluster.heal-timeout: 1800 >> disperse.eager-lock: on >> features.cache-invalidation: on >> features.cache-invalidation-timeout: 600 >> network.inode-lru-limit: 500000 >> network.ping-timeout: 7 >> network.remote-dio: enable >> performance.cache-invalidation: on >> performance.cache-size: 1GB >> performance.io-thread-count: 4 >> performance.md-cache-timeout: 600 >> performance.rda-cache-limit: 256MB >> performance.read-ahead: off >> performance.readdir-ahead: on >> performance.stat-prefetch: on >> performance.write-behind-window-size: 32MB >> server.event-threads: 4 >> cluster.background-self-heal-count: 1 >> performance.cache-refresh-timeout: 10 >> features.ctime: off >> cluster.granular-entry-heal: enable >> >> - The output of the gluster volume status command: >> gluster volume status >> Status of volume: ap >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick forge:/mnt/forge_block1/ap 49152 0 Y 2622 >> Self-heal Daemon on localhost N/A N/A N N/A >> >> Task Status of Volume ap >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> - The output of the gluster volume heal command: >> gluster volume heal ap enable >> Enable heal on volume ap has been successful >> >> gluster volume heal ap >> Launching heal operation to perform index self heal on volume ap has been unsuccessful: >> Self-heal daemon is not running. Check self-heal daemon log file. >> >> - The operating system / glusterfs version: >> OpenSUSE 15.2, glusterfs 9.1. >> >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police, APK Mirror, Illogical Robot LLC >> beerpla.net | @ArtemR >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users