Re: A few queries on self-healing and AFR (glusterfs 3.4.2)

A Ghoshal <a.ghoshal@xxxxxxx> · Tue, 3 Feb 2015 04:11:20 +0530

It seems I found out what
goes wrong here - and this was useful learning to me:

On one of the replica
servers, the client mount did not have an open port to communicate with
the other krfsd process. To illustrate:

root@serv1:/root> ps -ef | grep
replicated_vol

root     30627    
1  0 Jan29 ?        00:17:30 /usr/sbin/glusterfs
--volfile-id=replicated_vol --volfile-server=serv1 /mnt/replicated_vol

root     31132 18322  0
23:04 pts/1    00:00:00 grep _opt_kapsch_cnp_data_memusage

root     31280    
1  0 06:32 ?        00:09:10 /usr/sbin/glusterfsd
-s serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick
-p /var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid
-S /var/run/4d70e99b47c1f95cc2eab1715d3a9b67.socket --brick-name /mnt/bricks/replicated_vol/brick
-l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log --xlator-option
*-posix.glusterd-uuid=c7930be6-969f-4f62-b119-c5bbe4df22a3 --brick-port
49172 --xlator-option replicated_vol.listen-port=49172

root@serv1:/root> netstat -p | grep
30627

tcp        0  
   0 serv1:715           serv1:24007
        ESTABLISHED 30627/glusterfs <=
client<->local glusterd

tcp        0  
   0 serv1:863           serv1:49172
        ESTABLISHED 30627/glusterfs <=
client<->local brick

root@serv1:/root> 

However, the client
on the other server did have a port open to the mount, and so whatever
one wrote on the other server synced over immediately.

root@serv0:/root> ps -ef | grep
replicated_vol

root     12761  7556
 0 23:05 pts/1    00:00:00 replicated_vol

root     15067    
1  0 06:32 ?        00:04:50 /usr/sbin/glusterfsd
-s serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick
-p /var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid
-S /var/run/f642d7dbff0ab7a475a23236f6f50b33.socket --brick-name /mnt/bricks/replicated_vol/brick
-l /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log --xlator-option
*-posix.glusterd-uuid=13df1bd2-6dc8-49fa-ade0-5cd95f6b1f19 --brick-port
49209 --xlator-option replicated_vol.listen-port=49209

root     30587    
1  0 Jan30 ?        00:12:17 /usr/sbin/glusterfs
--volfile-id=serv --volfile-server=serv0 /mnt/replicated_vol

root@serv0:/root> netstat -p | grep
30587

tcp        0  
   0 serv0:859           serv1:49172
        ESTABLISHED 30587/glusterfs <=
client<->remote brick

tcp        0  
   0 serv0:746           serv0:24007
        ESTABLISHED 30587/glusterfs <=
client<->glusterd

tcp        0  
   0 serv0:857           serv0:49209
        ESTABLISHED 30587/glusterfs <=
client<->local brick

root@serv0:/root> 

So, the client has no
open tcp link with the mate brick - which is why it cannot write to the
mate brick directly, and instead has to rely on the self-heal daemon instead
to do the job. Of course, I now need to debug why the connection fails,
but at least we are clean on AFR. 

Thanks everyone.

From:      
 A Ghoshal <a.ghoshal@xxxxxxx>

To:      
 gluster-users@xxxxxxxxxxx

Date:      
 02/03/2015 12:00 AM

Subject:    

A few queries on self-healing and AFR (glusterfs        3.4.2)

Sent by:    
   gluster-users-bounces@xxxxxxxxxxx

Hello,

I have a replica-2 volume in which I store a large number of files that
are updated frequently (critical log files, etc). My files are generally
stable, but one thing that does worry me from time to time is that files
show up on one of the bricks in the output of gluster v <volname>
heal info. These entries disappear on their own after a while (I am guessing
when cluster.heal-timeout expires and another heal by the self-heal daemon
is triggered). For certain files, this could be a bit of a bother - in
terms of fault tolerance... 

I was wondering if there is a way I could force AFR to return write-completion
to the application only _after_ the data is written to both replicas successfully
(kind of, like, atomic writes) - even if it were at the cost of performance.
This way I could ensure that my bricks shall always be in sync. 

The other thing I could possibly do is reduce my cluster.heal-timeout (it
is 600 currently). Is it a bad idea to set it to something as small as
say, 60 seconds for volumes where redundancy is a prime concern? 

One question, though - is heal through self-heal daemon accomplished using
separate threads for each replicated volume, or is it a single thread for
every volume? The reason I ask is I have a large number of replicated file-systems
on each volume (17, to be precise) but I do have a reasonably powerful
multicore processor array and large RAM and top indicates the load on the
system resources is quite moderate. 

Thanks, 

Anirban
=====-----=====-----=====

Notice: The information contained in this e-mail

message and/or attachments to it may contain 

confidential or privileged information. If you are 

not the intended recipient, any dissemination, use, 

review, distribution, printing or copying of the 

information contained in this e-mail message 

and/or attachments to it are strictly prohibited. If 

you have received this communication in error, 

please notify us by reply e-mail or telephone and 

immediately and permanently delete the message 

and any attachments. Thank you_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users