Implementing multiplexing for self heal client.

RAFI KC <rkavunga@xxxxxxxxxx> · Fri, 21 Dec 2018 18:30:13 +0530

Hi All,

What is the problem?
As of now self-heal client is running as one daemon per node, this means 
even if there are multiple volumes, there will only be one self-heal 
daemon. So to take effect of each configuration changes in the cluster, 
the self-heal has to be reconfigured. But it doesn't have ability to 
dynamically reconfigure. Which means when you have lot of volumes in the 
cluster, every management operation that involves configurations changes 
like volume start/stop, add/remove brick etc will result in self-heal 
daemon restart. If such operation is executed more often, it is not only 
slow down self-heal for a volume, but also increases the slef-heal logs 
substantially.

How to fix it?

We are planning to follow a similar procedure as attach/detach graphs 
dynamically which is similar to brick multiplex. The detailed steps is 
as below,

1) First step is to make shd per volume daemon, to generate/reconfigure 
volfiles per volume basis .

  1.1) This will help to attach the volfiles easily to existing shd daemon

  1.2) This will help to send notification to shd daemon as each 
volinfo keeps the daemon object

  1.3) reconfiguring a particular subvolume is easier as we can check 
the topology better

  1.4) With this change the volfiles will be moved to workdir/vols/ 
directory.

2) Writing new rpc requests like attach/detach_client_graph function to 
support clients attach/detach

  2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to 
be modified

3) Safely detaching a subvolume when there are pending frames to unwind.

  3.1) We can mark the client disconnected and make all the frames to 
unwind with ENOTCONN

  3.2) We can wait all the i/o to unwind until the new updated subvol 
attaches

4) Handle scenarios like glusterd restart, node reboot, etc

At the moment we are not planning to limit the number of heal subvolmes 
per process as, because with the current approach also for every volume 
heal was doing from a single process. We have not heared any major 
complains on this?

Regards

Rafi KC

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel