Re: A few queries on self-healing and AFR (glusterfs 3.4.2)

A Ghoshal <a.ghoshal@xxxxxxx> · Thu, 5 Feb 2015 18:00:13 +0530

Thank you, Krutika. We
are currently planning to migrate our system to 3.5.3. Should be done in
a month. 

If you look at my follow
up mail, though, and also at http://www.gluster.org/pipermail/gluster-users/2015-February/020519.html,
which is another thread I started some time back, but now find out that
they're basically the same problem.

The problem, what I
found out was this: I have the following setup:

> > Volume Name: replicated_vol

> > Type: Replicate

> > Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2

> > Status: Started

> > Number of Bricks: 1 x 2 = 2

> > Transport-type: tcp

> > Bricks:

> > Brick1: serv0:/mnt/bricks/replicated_vol/brick

> > Brick2: serv1:/mnt/bricks/replicated_vol/brick

> > Options Reconfigured:

> > diagnostics.client-log-level: INFO

> > network.ping-timeout: 10

> > nfs.enable-ino32: on

> > cluster.self-heal-daemon: on

> > nfs.disable: off

replicated_vol is mounted
using mount.glusterfs at /mnt/replicated_vol on both servers. I found out
using `netstat` that while the mount client (usr/sbin/glusterfs) on serv1
was connection to three ports (local glusterd, and local and remote glusterfsd),
the mount client on serv0 was connected only to the local glusterfsd and
glusterd. In effect, none of the write requests serviced by the mount client
on serv0 were not being sent to glusterfsd on the serv1. All writes were
being transferred to serv1 from serv0 only later by the shd once every
cluster.heal-timeout.

More investigation revealed
the following: mount-client on serv0 had stale port information about the
listen port of glusterfsd on serv1. On Jan 30 serv1 underwent a reboot,
following which the brick-port on it changed but the mount client on serv0
was never made aware about it and continued to attempt connection on the
old port number every 3 seconds (also filling up my /var/log in the process).

More technical details
may be found in the email link that I pasted above. I'd greatly appreciate
some advice on what should be the next thing to look for. Also, we do not
have a firewall on our servers - they're only test setups and not downright
prod.. 

Thanks again,

Anirban

From:      
 Krutika Dhananjay <kdhananj@xxxxxxxxxx>

To:      
 A Ghoshal <a.ghoshal@xxxxxxx>

Cc:      
 gluster-users@xxxxxxxxxxx

Date:      
 02/05/2015 05:44 PM

Subject:    
   Re: 
A few queries on self-healing and AFR (glusterfs        3.4.2)

From: "A Ghoshal" <a.ghoshal@xxxxxxx>

To: gluster-users@xxxxxxxxxxx

Sent: Tuesday, February 3, 2015 12:00:15 AM

Subject:  A few queries on self-healing and AFR (glusterfs
       3.4.2)

Hello,

I have a replica-2 volume in which I store a large number of files that
are updated frequently (critical log files, etc). My files are generally
stable, but one thing that does worry me from time to time is that files
show up on one of the bricks in the output of gluster v <volname>
heal info. These entries disappear on their own after a while (I am guessing
when cluster.heal-timeout expires and another heal by the self-heal daemon
is triggered). For certain files, this could be a bit of a bother - in
terms of fault tolerance...

In 3.4.x, even files that are currently undergoing modification
will be listed in heal-info output. So this could be the reason why the
file(s) disappear from the output after a while, in which case reducing
cluster.heal-timeout might not solve the problem. Since 3.5.1, heal-info
_only_ reports those files which are truly undergoing heal.

I was wondering if there is a way I could force AFR to return write-completion
to the application only _after_ the data is written to both replicas successfully
(kind of, like, atomic writes) - even if it were at the cost of performance.
This way I could ensure that my bricks shall always be in sync. 

AFR has always returned write-completion status to the
application only _after_ the data is written to all replicas. The appearance
of files under modification in heal-info output might have led you to think
the changes have not (yet) been synced to the other replica(s).

The other thing I could possibly do is reduce my cluster.heal-timeout (it
is 600 currently). Is it a bad idea to set it to something as small as
say, 60 seconds for volumes where redundancy is a prime concern? 

One question, though - is heal through self-heal daemon accomplished using
separate threads for each replicated volume, or is it a single thread for
every volume? The reason I ask is I have a large number of replicated file-systems
on each volume (17, to be precise) but I do have a reasonably powerful
multicore processor array and large RAM and top indicates the load on the
system resources is quite moderate.

There is an infra piece called syncop in gluster using
which multiple heal jobs are handled by handful of threads. The maximum
it can scale up to is 16 depending on the load. It is safe to assume that
there will be one healer thread per replica set. But if the load is not
too high, just 1 thread may do all the healing.

-Krutika

Thanks,

Anirban
=====-----=====-----=====

Notice: The information contained in this e-mail

message and/or attachments to it may contain 

confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 

review, distribution, printing or copying of the 

information contained in this e-mail message 

and/or attachments to it are strictly prohibited. If 

you have received this communication in error, 

please notify us by reply e-mail or telephone and 

immediately and permanently delete the message 

and any attachments. Thank you

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users