Re: Mount problems when secondary node down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/10/2014 11:47 PM, A F wrote:
Hello,

I have two servers, 192.168.0.10 and 192.168.2.10. I'm using gluster 3.6.1 (installed from gluster repo) on AWS Linux. Both servers are completely reachable in LAN.
# rpm -qa|grep gluster
glusterfs-3.6.1-1.el6.x86_64
glusterfs-server-3.6.1-1.el6.x86_64
glusterfs-libs-3.6.1-1.el6.x86_64
glusterfs-api-3.6.1-1.el6.x86_64
glusterfs-cli-3.6.1-1.el6.x86_64
glusterfs-fuse-3.6.1-1.el6.x86_64

These are the commands I ran:
# gluster peer probe 192.168.2.10
# gluster volume create aloha replica 2 transport tcp 192.168.0.10:/var/aloha 192.168.2.10:/var/aloha force
# gluster volume start aloha
# gluster volume set aloha network.ping-timeout 5
# gluster volume set aloha nfs.disable on

Problem number 1:
tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log shows log cluttering with:
[2014-11-10 17:41:26.328796] W [socket.c:611:__socket_rwv] 0-management: readv on /var/run/38c520c774793c9cdae8ace327512027.socket failed (Invalid argument)
this happens every 3 seconds on both servers. It is related to NFS and probably rpcbind, but I absolutely want them disabled. As you see, I've set gluster to disable nfs - why doesn't it keep quiet about it then?

Problem number 2:
in fstab on server 192.168.0.10:   192.168.0.10:/aloha /var/www/hawaii      glusterfs       defaults,_netdev        0 0
in fstab on server 192.168.2.10:   192.168.2.10:/aloha /var/www/hawaii      glusterfs       defaults,_netdev        0 0

If I shutdown one of the servers (192.168.2.10), and I reboot the remaining one (192.168.0.10), it won't come up as fast as it should. It lags a few minutes waiting for gluster. After it eventually starts, mount point is not mounted and volume is stopped:
# gluster volume status
Status of volume: aloha
Gluster process                                         Port Online  Pid
------------------------------------------------------------------------------
Brick 192.168.0.10:/var/aloha                           N/A N       N/A
Self-heal Daemon on localhost                           N/A N       N/A

Task Status of Volume aloha
------------------------------------------------------------------------------
There are no active volume tasks

This didn't happen before, so fine, I first have to stop the volume and then start it again. It now shows as online:
Brick 192.168.0.10:/var/aloha                           49155 Y       3473
Self-heal Daemon on localhost                           N/A Y       3507

# time mount -a
real    2m7.307s

# time mount -t glusterfs 192.168.0.10:/aloha /var/www/hawaii
real    2m7.365s

# strace mount -t glusterfs 192.168.0.10:/aloha /var/www/hawaii
(attached)

# tail /var/log/glusterfs/* -f|grep -v readv
(attached)

I've done this setup before, so I'm amazed it doesn't work. I even have it in production at the moment, with the same options and setup, and for example I'm not getting readv errors. I'm unable to test the mount part though, but I feel I have covered it way back when I was testing the environment.
Any help is kindly appreciated.
CC glusterd folks

Pranith


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux