Re: 2 node replica 2 cluster - volume on one node stopped responding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Some extra points:

- 10.100.3.41 is one of the oVirt hosts.

- I only needed to restart glusterfsd & glusterd in one of the gluster nodes (also the one where I pulled the logs from) to get everything in working order.

- it's a separate gluster volume, not managed from oVirt engine.

On 8 June 2015 at 11:35, Tiemen Ruiten <t.ruiten@xxxxxxxxxxx> wrote:
Hello,

We are running an oVirt cluster on top of a 2 node replica 2 Gluster volume. Yesterday we suddenly noticed VMs were not responding and quickly found out the Gluster volume had issues. These errors were filling up the etc-glusterfs-glusterd.log file:

[2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 10.100.3.41:1022
[2015-06-07 08:36:26.498073] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
 
A restart of glusterfsd and glusterd resolved the issue, but triggered a lot of self-heals.

We are running glusterfs 3.7.0 on ZFS.

I have attached etc-glusterfs-glusterd.log, the brick log file and the glustershd.log. I would be grateful if anyone could shed any light on what happened here and if there's anything we can do to prevent it.

--
Tiemen Ruiten
Systems Engineer
R&D Media



--
Tiemen Ruiten
Systems Engineer
R&D Media
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux