Hi,
Recently, my company needed to change our hostnames used in the Gluster Pool.In a first moment, we have two Gluster Nodes called storage1 and storage2. Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME. We put the storage1 and storage2 IPs in the /etc/hosts file of our nodes and in our client servers.
After some time, more client servers started to using Gluster and we discovered that using hostnames without domain (using /etc/hosts) in all client servers is a pain in the a$$ :(. So, we decided to change them to something like storage1.mydomain.com and storage2.mydomain.com.
Remember that, at this point, we had already some volumes (with bricks):
$ gluster volume info MYVOL
[...]
Brick1: storage1:/MYDIR
Brick1: storage2:/MYDIR
[...]
Brick1: storage1:/MYDIR
Brick1: storage2:/MYDIR
10.10.10.2 storage2
To implement the hostname changes, we've changed the etc hosts file to:
10.10.10.1 storage1 storage1.mydomain.com
10.10.10.2 storage2 storage2.mydomain.com
10.10.10.1 storage1 storage1.mydomain.com
10.10.10.2 storage2 storage2.mydomain.com
And we've run in storage1:
$ gluster peer probe storage2.mydomain.com
peer probe: success
peer probe: success
Everything works well during some time, but the glusterd starts to fail after any reboot:
$ service glusterfs-server status
glusterfs-server start/running, process 14714
$ service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 14860
$ service glusterfs-server status
glusterfs-server stop/waiting
$ service glusterfs-server restart
glusterfs-server stop/waiting
glusterfs-server start/running, process 14860
$ service glusterfs-server status
glusterfs-server stop/waiting
To start the service again, it was necessary to rollback the hostname1 config to storage2 in /var/lib/glusterd/peers/OUR_UUID.
After some try and error, we discovered that if we change the order of the entries in /etc/hosts and repeat the process, everything worked.
So we've checked the Glusterd debug log and checked the GlusterFS source code and discovered that the big secret was the functio
Rarylson Freitas
n glusterd_friend_find_by_hostname
, in the file xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for each brick that isn't a local brick and does the following things:- It checks if the brick hostname is equal to some peer hostname;
- If it's, this peer is our wanted friend;
- If not, it gets the brick IP (resolves the hostname using the function
getaddrinfo
) and checks if the brick IP is equal to the peer hostname; - It is, we could run gluster peer probe 10.10.10.2. Once the brick IP (storage2 resolves to 10.10.10.2) would have equal to the peer "hostname" (10.10.10.2);
- If it's, this peer is our wanted friend;
- If not, gets the reverse of the brick IP (using the function
getnameinfo
) and checks if the brick reverse is equal to the peer hostname; - This is why changing the order of the entries in /etc/hosts worked as an workaround for us;
- If not, returns and error (and Glusterd will fail).
However, we think that comparing the brick IP (resolving the brick hostname) and the peer IP (resolving the peer hostname) would be a simpler and more comprehensive solution. Once both brick and peer will have difference hostnames, but the same IP, it would work.
The solution could be:
- It checks if the brick hostname is equal to some peer hostname;
- If it's, this peer is our wanted friend;
- If not, it gets both the brick IP (resolves the hostname using the function
getaddrinfo
) and the peer IP (resolves the peer hostname) and, for each IP pair, check if a brick IP is equal to a peer IP; - If it's, this peer is our wanted friend;
- If not, returns and error (and Glusterd will fail).
What do you think about it?
Rarylson Freitas
Computer Engineer
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel