Re: Peer Rejected(Connected) and Self heal daemon is not running causing split brain

Kaamesh Kamalaaharan <kaamesh@xxxxxxxxxxxxx> · Fri, 27 Feb 2015 15:44:06 +0800

Hi everyone,
I managed to fix my problem. It was an old process of gluster which was using up the ports.. i manually killed all the processes and the corresponding /var/run file used by that process before starting a new instance of gluster and it worked fine. 

What i did was : 

service glusterfs-server stop

kill -9 `ps -ef | grep gl | grep -v grep | awk '{print $2}'`

rm -r /var/lib/glusterd/geo-replication /var/lib/glusterd/glustershd /var/lib/glusterd/groups /var/lib/glusterd/hooks /var/lib/glusterd/nfs /var/lib/glusterd/options /var/lib/glusterd/peers /var/lib/glusterd/quotad /var/lib/glusterd/vols

service glusterfs-server start

gluster peer probe gfs1

service glusterfs-server restart

gluster volume sync gfs1 all

Thank You Kindly,
Kaamesh
Bioinformatician
Novocraft Technologies Sdn Bhd
C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya
Selangor Darul Ehsan
Malaysia
Mobile: +60176562635
Ph: +60379600541
Fax: +60379600540

On Fri, Feb 27, 2015 at 8:51 AM, Kaamesh Kamalaaharan <kaamesh@xxxxxxxxxxxxx> wrote:
Hi atin,
I have tried to flush the iptables and this time i managed to get the peer into cluster. However, the self heal daemon is still offline and im unable to bring the daemon back online on gfs2. Doing a heal on either server gives me a succesful output but when i check the heal info i am getting many split brain errors on gfs2

Thank You Kindly,
Kaamesh

On Thu, Feb 26, 2015 at 5:40 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
Could you check the N/W firewall setting? Flush iptable setting using

iptables -F and retry.

~Atin

On 02/26/2015 02:55 PM, Kaamesh Kamalaaharan wrote:

> Hi guys,

>

> I managed to get gluster running but im having a couple of issues with my

> setup 1) my peer status is rejected but connected 2) my self heal daemon is

> not running on one server and im getting split brain files.

> My setup is two gluster volumes  (gfs1 and gfs2) on replicate each with a

> brick

>

> 1) My peer status doesnt go into Peer in Cluster. running a peer status

> command gives me State:Peer Rejected (Connected) . At this point, the brick

> on gfs2 does not go online and i get this output

>

>

> #gluster volume status

>

> Status of volume: gfsvolume

>

> Gluster process Port Online Pid

>

> ------------------------------------------------------------------------------

>

> Brick gfs1:/export/sda/brick 49153 Y 15025

>

> NFS Server on localhost 2049 Y 15039

>

> Self-heal Daemon on localhost N/A Y 15044

>

>

>

> Task Status of Volume gfsvolume

>

> ------------------------------------------------------------------------------

>

> There are no active volume tasks

>

>

>

> I have followed the methods used in one of the threads and performed the

> following

>

>    a) stop glusterd

>    b) rm all files in /var/lib/glusterd/  except for glusterd.info

>    c) start glusterd and probe gfs1 from gfs2 and peer status which gives me

>

>

> # gluster peer status

>

> Number of Peers: 1

>

>

> Hostname: gfs1

>

> Uuid: 49acc9c2-4809-4da5-a6f0-6a3d48314070

>

> State: Sent and Received peer request (Connected)

>

>

> the same thread mentioned that changing the status of the peer in

> /var/lib/glusterd/peer/{UUID} from status=5 to status=3 fixes this and on

> restart of gfs1 the peer status goes to

>

> #gluster peer status

>

> Number of Peers: 1

>

>

> Hostname: gfs1

>

> Uuid: 49acc9c2-4809-4da5-a6f0-6a3d48314070

>

> State: Peer in Cluster (Connected)

>

> This fixes the connection between the peers and the volume status shows

>

>

> Status of volume: gfsvolume

>

> Gluster process Port Online Pid

>

> ------------------------------------------------------------------------------

>

> Brick gfs1:/export/sda/brick 49153 Y 10852

>

> Brick gfs2:/export/sda/brick 49152 Y 17024

>

> NFS Server on localhost N/A N N/A

>

> Self-heal Daemon on localhost N/A N N/A

>

> NFS Server on gfs2 N/A N N/A

>

> Self-heal Daemon on gfs2 N/A N N/A

>

>

>

> Task Status of Volume gfsvolume

>

> ------------------------------------------------------------------------------

>

> There are no active volume tasks

>

>

> Which brings us to problem 2

>

> 2) My self-heal demon is not alive

>

> I fixed the self heal on gfs1 by running

>

>  #find <gluster-mount> -noleaf -print0 | xargs --null stat >/dev/null

> 2>/var/log/gluster/<gluster-mount>-selfheal.log

>

> and running a volume status command gives me

>

> # gluster volume status

>

> Status of volume: gfsvolume

>

> Gluster process Port Online Pid

>

> ------------------------------------------------------------------------------

>

> Brick gfs1:/export/sda/brick 49152 Y 16660

>

> Brick gfs2:/export/sda/brick 49152 Y 21582

>

> NFS Server on localhost 2049 Y 16674

>

> Self-heal Daemon on localhost N/A Y 16679

>

> NFS Server on gfs2 N/A N 21596

>

> Self-heal Daemon on gfs2 N/A N 21600

>

>

>

> Task Status of Volume gfsvolume

>

> ------------------------------------------------------------------------------

>

> There are no active volume tasks

>

>

>

> However, running this on gfs2 doesnt fix the daemon.

>

> Restarting the gfs2 server brings me back to problem 1 and the cycle

> continues..

>

> Can anyone assist me with this issue(s).. thank you.

>

> Thank You Kindly,

> Kaamesh

>

>

>

> _______________________________________________

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> http://www.gluster.org/mailman/listinfo/gluster-users

>

--

~Atin

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users