On 13 July 2015 at 19:06, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
Can you double check if any brick process is already running, if so kill
On 07/13/2015 10:29 PM, Tiemen Ruiten wrote:
> OK, I found what's wrong. From the brick's log:
>
> [2015-07-12 02:32:01.542934] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2015-07-13 14:21:06.722675] W [glusterfsd.c:1219:cleanup_and_exit] (-->
> 0-: received signum (15), shutting down
> [2015-07-13 14:21:35.168750] I [MSGID: 100030] [glusterfsd.c:2294:main]
> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.1
> (args: /usr/sbin/glusterfsd -s 10.100.3.10 --volfile-id
> vmimage.10.100.3.10.export-gluster01-brick -p
> /var/lib/glusterd/vols/vmimage/run/10.100.3.10-export-gluster01-brick.pid
> -S /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket --brick-name
> /export/gluster01/brick -l
> /var/log/glusterfs/bricks/export-gluster01-brick.log --xlator-option
> *-posix.glusterd-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3 --brick-port
> 49153 --xlator-option vmimage-server.listen-port=49153)
> [2015-07-13 14:21:35.178558] E [socket.c:823:__socket_server_bind]
> 0-socket.glusterfsd: binding to failed: Address already in use
> [2015-07-13 14:21:35.178624] E [socket.c:826:__socket_server_bind]
> 0-socket.glusterfsd: Port is already in use
> [2015-07-13 14:21:35.178649] W [rpcsvc.c:1602:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
>
>
> ps aux | grep gluster
> root 6417 0.0 0.2 753080 175016 ? Ssl May21 25:25
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/wwwdata
> /mnt/gluster/web/wwwdata
> root 6742 0.0 0.0 622012 17624 ? Ssl May21 22:31
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/conf
> /mnt/gluster/conf
> root 36575 0.2 0.0 589956 19228 ? Ssl 16:21 0:19
> /usr/sbin/glusterd --pid-file=/run/glusterd.pid
> root 36720 0.0 0.0 565140 55836 ? Ssl 16:21 0:02
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
> /var/run/gluster/8b9ce8bebfa8c1d2fabb62654bdc550e.socket
> root 36730 0.0 0.0 451016 22936 ? Ssl 16:21 0:01
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
> /var/lib/glusterd/glustershd/run/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/c0d7454986c96eef463d028dc8bce9fe.socket --xlator-option
> *replicate*.node-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3
> root 37398 0.0 0.0 103248 916 pts/2 S+ 18:49 0:00 grep
> gluster
> root 40058 0.0 0.0 755216 60212 ? Ssl May21 22:06
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/fl-webroot
> /mnt/gluster/web/flash/webroot
>
> So several leftover processes. What will happen if I do a
>
> /etc/init.d/glusterd stop
> /etc/init.d/glusterfsd stop
>
> kill all remaining gluster processes and restart gluster on this node?
>
> Will the volume stay online? What about split-brain? I suppose it would be
> best to disconnect all clients first...?
it and try 'gluster volume start <volname> force'
> _______________________________________________>
>
> On 13 July 2015 at 18:25, Tiemen Ruiten <t.ruiten@xxxxxxxxxxx> wrote:
>
>> Hello,
>>
>> We have a two-node gluster cluster, running version 3.7.1, that hosts an
>> oVirt storage domain. This afternoon I tried creating a template in oVirt,
>> but within a minute VM's stopped responding and Gluster started generating
>> errors like the following:
>>
>> [2015-07-13 14:09:51.772629] W [rpcsvc.c:270:rpcsvc_program_actor]
>> 0-rpc-service: RPC program not available (req 1298437 330) for
>> 10.100.3.40:1021
>> [2015-07-13 14:09:51.772675] E [rpcsvc.c:565:rpcsvc_check_and_reply_error]
>> 0-rpcsvc: rpc actor failed to complete successfully
>>
>> I managed to get things in working order again by restarting glusterd and
>> glusterfsd, but now one brick is down:
>>
>> $sudo gluster volume status vmimage
>> Status of volume: vmimage
>> Gluster process TCP Port RDMA Port Online
>> Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 10.100.3.10:/export/gluster01/brick N/A N/A N
>> 36736
>> Brick 10.100.3.11:/export/gluster01/brick 49153 0 Y
>> 11897
>> NFS Server on localhost 2049 0 Y
>> 36720
>> Self-heal Daemon on localhost N/A N/A Y
>> 36730
>> NFS Server on 10.100.3.11 2049 0 Y
>> 11919
>> Self-heal Daemon on 10.100.3.11 N/A N/A Y
>> 11924
>>
>> Task Status of Volume vmimage
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> $ sudo gluster peer status
>> Number of Peers: 1
>>
>> Hostname: 10.100.3.11
>> Uuid: f9872fea-47f5-41f6-8094-c9fabd3c1339
>> State: Peer in Cluster (Connected)
>>
>> Additionally in the etc-glusterfs-glusterd.vol.log I see these messages
>> repeating every 3 seconds:
>>
>> [2015-07-13 16:15:21.737044] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> The message "I [MSGID: 106005]
>> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick
>> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd."
>> repeated 39 times between [2015-07-13 16:13:24.717611] and [2015-07-13
>> 16:15:21.737862]
>> [2015-07-13 16:15:24.737694] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:24.738498] I [MSGID: 106005]
>> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick
>> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd.
>> [2015-07-13 16:15:27.738194] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:30.738991] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:33.739735] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>>
>> Can I get this brick back up without bringing the volume/cluster down?
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
>
>
>
>
>
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
>
--
~Atin
Hi Atin,
I see brick processes for volumes wwwdata, conf and fl-webroot, judging from the ps aux | grep gluster output. These volumes are not started. No brick process for vmimage. So you're saying, kill those brick processes, then gluster volume start vmimage force?
Thank you for your response.
Tiemen Ruiten
Systems Engineer
R&D Media
Systems Engineer
R&D Media
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users