I have completed the patches and pushed for reviews. Please feel
free to raise your review concerns/suggestions.
https://review.gluster.org/#/c/glusterfs/+/21868
https://review.gluster.org/#/c/glusterfs/+/21907
https://review.gluster.org/#/c/glusterfs/+/21960
https://review.gluster.org/#/c/glusterfs/+/21989/
Regards
Rafi KC
On 12/24/18 3:58 PM, RAFI KC wrote:
On 12/21/18 6:56 PM, Sankarshan Mukhopadhyay wrote:
On Fri, Dec 21, 2018 at 6:30 PM RAFI KC
<rkavunga@xxxxxxxxxx> wrote:
Hi All,
What is the problem?
As of now self-heal client is running as one daemon per node,
this means
even if there are multiple volumes, there will only be one
self-heal
daemon. So to take effect of each configuration changes in the
cluster,
the self-heal has to be reconfigured. But it doesn't have
ability to
dynamically reconfigure. Which means when you have lot of
volumes in the
cluster, every management operation that involves
configurations changes
like volume start/stop, add/remove brick etc will result in
self-heal
daemon restart. If such operation is executed more often, it
is not only
slow down self-heal for a volume, but also increases the
slef-heal logs
substantially.
What is the value of the number of volumes when you write "lot
of
volumes"? 1000 volumes, more etc
Yes, more than 1000 volumes. It also depends on how often you
execute glusterd management operations (mentioned above). Each
time self heal daemon is restarted, it prints the entire graph.
This graph traces in the log will contribute the majority it's
size.
How to fix it?
We are planning to follow a similar procedure as attach/detach
graphs
dynamically which is similar to brick multiplex. The detailed
steps is
as below,
1) First step is to make shd per volume daemon, to
generate/reconfigure
volfiles per volume basis .
1.1) This will help to attach the volfiles easily to
existing shd daemon
1.2) This will help to send notification to shd daemon as
each
volinfo keeps the daemon object
1.3) reconfiguring a particular subvolume is easier as we
can check
the topology better
1.4) With this change the volfiles will be moved to
workdir/vols/
directory.
2) Writing new rpc requests like attach/detach_client_graph
function to
support clients attach/detach
2.1) Also functions like graph reconfigure,
mgmt_getspec_cbk has to
be modified
3) Safely detaching a subvolume when there are pending frames
to unwind.
3.1) We can mark the client disconnected and make all the
frames to
unwind with ENOTCONN
3.2) We can wait all the i/o to unwind until the new
updated subvol
attaches
4) Handle scenarios like glusterd restart, node reboot, etc
At the moment we are not planning to limit the number of heal
subvolmes
per process as, because with the current approach also for
every volume
heal was doing from a single process. We have not heared any
major
complains on this?
Is the plan to not ever limit or, have a throttle set to a
default
high(er) value? How would system resources be impacted if the
proposed
design is implemented?
The plan is to implement in a way that it can support more than
one multiplexed self-heal daemon. The throttling function as of
now returns the same process to multiplex, but it can be easily
modified to create a new process.
This multiplexing logic won't utilize any additional resources
that it currently does.
Rafi KC
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
|
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel