> Don't forget to run the geo-replication fix script , if you missed to do it > before the upgrade. We don't use geo-replication YET but thank you for this thoughtful reminder. Just a note on things like this -- we really try to do everything in a package update because that's how we'd have to deploy to customers in an automated way. So having to run a script as part of the upgrade would be very hard in a package based work flow for a packged solution. I'm not complaining I love gluster but this is just food for thought. I can't even hardly say it with a straight face because we suffer from similar issues on the cluster management side - updating one CM to the next is harder than it should be so I'm certainly not judging. Updating is always painful. I LOVE that slowly updating our gluster servers is "Just working". This will allow a supercomputer to slowly update their infrastructure while taking no compute nodes (using nfs-hosted squashfs images or root) down. It's really remarkable since it's a big jump too 7.9 to 9.3 I am impressed by this part. It's a huge relief that I didn't have to do an intermediate jump to gluster8 in the middle as that would have been nearly impossible for us to get right. Thank you all!! PS: Frontier will have 21 leader nodes running gluster servers. Distributed/replicate in groups of 3 hosting nfs-exported squashfs image objects for compute node root filesystems. Many thousands of nodes. > > Best Regards, > Strahil Nikolov > > > On Tue, Sep 21, 2021 at 0:46, Erik Jacobson > <erik.jacobson@xxxxxxx> wrote: > I pretended I'm a low-level C programmer with network and filesystem > experience for a few hours. > > I'm not sure what the right solution is but what was happening was the > code was trying to treat our IPV4 hosts as AF_INET6 and the family was > incompatible with our IPV4 IP addresses. Yes, we need to move to IPV6 > but we're hoping to do that on our own time (~50 years like everybody > else :) > > I found a chunk of the code that seemed to be force-setting us to > AF_INET6. > > While I'm sure it is not 100% the correct patch, the patch attached and > pasted below is working for me so I'll integrate it with our internal > build to continue testing. > > Please let me know if there is a configuration item I missed or a > different way to do this. I added -devel to this email. > > In the previous thread, you would have seen that we're testing a > hopeful change that will upgrade our deployed customers from gluster > 7.9 to gluster 9.3. > > Thank you!! Advice on next steps would be appreciated !! > > > diff -Narup glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c > glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c > --- glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c 2021-06-29 > 00:27:44.381408294 -0500 > +++ glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c 2021-09-20 > 16:34:28.969425361 -0500 > @@ -252,9 +252,16 @@ af_inet_client_get_remote_sockaddr(rpc_t > /* Need to update transport-address family if address-family is not > provided > to command-line arguments > */ > + /* HPE This is forcing our IPV4 servers in to to an IPV6 address > + * family that is not compatible with IPV4. For now we will just set it > + * to AF_INET. > + */ > + /* > if (inet_pton(AF_INET6, remote_host, &serveraddr)) { > sockaddr->sa_family = AF_INET6; > } > + */ > + sockaddr->sa_family = AF_INET; > > /* TODO: gf_resolve is a blocking call. kick in some > non blocking dns techniques */ > > > On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson wrote: > > I missed the other important log snip: > > > > The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] > 0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for > hostname not supported}]" repeated 620 times between [2021-09-20 > 15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000] > > > > So I will dig in to the code some here. > > > > > > On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote: > > > Hello all! I hope you are well. > > > > > > We are starting a new software release cycle and I am trying to find a > > > way to upgrade customers from our build of gluster 7.9 to our build of > > > gluster 9.3 > > > > > > When we deploy gluster, we foribly remove all references to any host > > > names and use only IP addresses. This is because, if for any reason a > > > DNS server is unreachable, even if the peer files have IPs and DNS, it > > > causes glusterd to be unable to reach peers properly. We can't really > > > rely on /etc/hosts either because customers take artistic licene with > > > their /etc/hosts files and don't realize that problems that can cause. > > > > > > So our deployed peer files look something like this: > > > > > > uuid=46a4b506-029d-4750-acfb-894501a88977 > > > state=3 > > > hostname1=172.23.0.16 > > > > > > That is, with full intention, we avoid host names. > > > > > > When we upgrade to gluster 9.3, we fall over with these errors and > > > gluster is now partitioned and the updated gluster servers can't reach > > > anybody: > > > > > > [2021-09-20 15:50:41.731543 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host 172.23.0.16 > > > > > > > > > As you can see, we have defined on purpose everything using IPs but in > > > 9.3 it appears this method fails. Are there any suggestions short of > > > putting real host names in peer files? > > > > > > > > > > > > FYI > > > > > > This supercomputer will be using gluster for part of its system > > > management. It is how we deploy the Image Objects (squashfs images) > > > hosted on NFS today and served by gluster leader nodes and also store > > > system logs, console logs, and other data. > > > > > > https://www.olcf.ornl.gov/frontier/ > > > > > > > > > Erik > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users