On 07/13/2015 10:29 PM, Tiemen Ruiten wrote: > OK, I found what's wrong. From the brick's log: > > [2015-07-12 02:32:01.542934] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2015-07-13 14:21:06.722675] W [glusterfsd.c:1219:cleanup_and_exit] (--> > 0-: received signum (15), shutting down > [2015-07-13 14:21:35.168750] I [MSGID: 100030] [glusterfsd.c:2294:main] > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.1 > (args: /usr/sbin/glusterfsd -s 10.100.3.10 --volfile-id > vmimage.10.100.3.10.export-gluster01-brick -p > /var/lib/glusterd/vols/vmimage/run/10.100.3.10-export-gluster01-brick.pid > -S /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket --brick-name > /export/gluster01/brick -l > /var/log/glusterfs/bricks/export-gluster01-brick.log --xlator-option > *-posix.glusterd-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3 --brick-port > 49153 --xlator-option vmimage-server.listen-port=49153) > [2015-07-13 14:21:35.178558] E [socket.c:823:__socket_server_bind] > 0-socket.glusterfsd: binding to failed: Address already in use > [2015-07-13 14:21:35.178624] E [socket.c:826:__socket_server_bind] > 0-socket.glusterfsd: Port is already in use > [2015-07-13 14:21:35.178649] W [rpcsvc.c:1602:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > > > ps aux | grep gluster > root 6417 0.0 0.2 753080 175016 ? Ssl May21 25:25 > /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/wwwdata > /mnt/gluster/web/wwwdata > root 6742 0.0 0.0 622012 17624 ? Ssl May21 22:31 > /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/conf > /mnt/gluster/conf > root 36575 0.2 0.0 589956 19228 ? Ssl 16:21 0:19 > /usr/sbin/glusterd --pid-file=/run/glusterd.pid > root 36720 0.0 0.0 565140 55836 ? Ssl 16:21 0:02 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > /var/run/gluster/8b9ce8bebfa8c1d2fabb62654bdc550e.socket > root 36730 0.0 0.0 451016 22936 ? Ssl 16:21 0:01 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/c0d7454986c96eef463d028dc8bce9fe.socket --xlator-option > *replicate*.node-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3 > root 37398 0.0 0.0 103248 916 pts/2 S+ 18:49 0:00 grep > gluster > root 40058 0.0 0.0 755216 60212 ? Ssl May21 22:06 > /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/fl-webroot > /mnt/gluster/web/flash/webroot > > So several leftover processes. What will happen if I do a > > /etc/init.d/glusterd stop > /etc/init.d/glusterfsd stop > > kill all remaining gluster processes and restart gluster on this node? > > Will the volume stay online? What about split-brain? I suppose it would be > best to disconnect all clients first...? Can you double check if any brick process is already running, if so kill it and try 'gluster volume start <volname> force' > > > On 13 July 2015 at 18:25, Tiemen Ruiten <t.ruiten@xxxxxxxxxxx> wrote: > >> Hello, >> >> We have a two-node gluster cluster, running version 3.7.1, that hosts an >> oVirt storage domain. This afternoon I tried creating a template in oVirt, >> but within a minute VM's stopped responding and Gluster started generating >> errors like the following: >> >> [2015-07-13 14:09:51.772629] W [rpcsvc.c:270:rpcsvc_program_actor] >> 0-rpc-service: RPC program not available (req 1298437 330) for >> 10.100.3.40:1021 >> [2015-07-13 14:09:51.772675] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] >> 0-rpcsvc: rpc actor failed to complete successfully >> >> I managed to get things in working order again by restarting glusterd and >> glusterfsd, but now one brick is down: >> >> $sudo gluster volume status vmimage >> Status of volume: vmimage >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 10.100.3.10:/export/gluster01/brick N/A N/A N >> 36736 >> Brick 10.100.3.11:/export/gluster01/brick 49153 0 Y >> 11897 >> NFS Server on localhost 2049 0 Y >> 36720 >> Self-heal Daemon on localhost N/A N/A Y >> 36730 >> NFS Server on 10.100.3.11 2049 0 Y >> 11919 >> Self-heal Daemon on 10.100.3.11 N/A N/A Y >> 11924 >> >> Task Status of Volume vmimage >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> $ sudo gluster peer status >> Number of Peers: 1 >> >> Hostname: 10.100.3.11 >> Uuid: f9872fea-47f5-41f6-8094-c9fabd3c1339 >> State: Peer in Cluster (Connected) >> >> Additionally in the etc-glusterfs-glusterd.vol.log I see these messages >> repeating every 3 seconds: >> >> [2015-07-13 16:15:21.737044] W [socket.c:642:__socket_rwv] 0-management: >> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed >> (Invalid argument) >> The message "I [MSGID: 106005] >> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick >> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd." >> repeated 39 times between [2015-07-13 16:13:24.717611] and [2015-07-13 >> 16:15:21.737862] >> [2015-07-13 16:15:24.737694] W [socket.c:642:__socket_rwv] 0-management: >> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed >> (Invalid argument) >> [2015-07-13 16:15:24.738498] I [MSGID: 106005] >> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick >> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd. >> [2015-07-13 16:15:27.738194] W [socket.c:642:__socket_rwv] 0-management: >> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed >> (Invalid argument) >> [2015-07-13 16:15:30.738991] W [socket.c:642:__socket_rwv] 0-management: >> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed >> (Invalid argument) >> [2015-07-13 16:15:33.739735] W [socket.c:642:__socket_rwv] 0-management: >> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed >> (Invalid argument) >> >> Can I get this brick back up without bringing the volume/cluster down? >> >> -- >> Tiemen Ruiten >> Systems Engineer >> R&D Media >> > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > -- ~Atin _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users