Re: Priority based ping packet for 3.10

Raghavendra G <raghavendra@xxxxxxxxxxx> · Fri, 21 Apr 2017 11:43:58 +0530

Summing up various discussions I had on this,
1. Current ping frame work should measure just the responsiveness of network and rpc layer. This means poller threads shouldn't be winding the individual fops at all (as it might add delay in reading the ping requests). Instead, they can queue the requests to a common work queue and other threads should pick up the requests.
2. We also need another tool to measure the responsiveness of the entire Brick xlator stack. This tool can have a slightly larger time than ping timeout as responses naturally will be delayed. Whether this tool should measure the responsiveness of the backend fs is an open question as we already have a posix health checker that measures the responsiveness and sends a CHILD_DOWN when backend fs in not responsive. Also, there are open questions here like what data structures various xlators are accessing as part of this fop (like inode, fd, mem-pools etc). Accessing various data structures will result in a different latency.
3. Currently ping packets are not sent by a client when there is no I/O from it. As per the discussions above, client should measure the responsiveness even when there is no traffic to/from it. May be the interval during which ping packets are sent can be increased.
4. We've fixed some lock contention issues on the brick stack due to high latency on backend fs. However, this is on-going work as contentions can be found in various codepaths (mem-pool etc).

We'll shortly send a fix for 1. The other things will be picked based on the bandwidth. Contributions are welcome :).

regards,
Raghavendra.

On Wed, Jan 25, 2017 at 11:01 AM, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote:
Yes, the earlier a fault is detected the better.

On January 24, 2017 9:21:27 PM PST, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote:
 If there are no responses to be received and no requests being
 sent to a brick, why would be a client be interested in the health of
 server/brick?

The client (code) might not, but the user might want to find out and fix
the fault before the brick gets busy again.

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

-- 

Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel