Re: Priority based ping packet for 3.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Jan 19, 2017 at 3:59 PM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:
The more relevant question would be with TCP_KEEPALIVE and TCP_USER_TIMEOUT on sockets, do we really need ping-pong framework in Clients? We might need that in transport/rdma setups, but my question is concentrating on transport/rdma.

s \ concentrating on transport/rdma \ concentrating on transport/socket \

In other words would like to hear why do we need heart-beat mechanism in the first place. One scenario might be a healthy socket level connection but an unhealthy brick/client (like a deadlocked one). Are there enough such realistic scenarios which make ping-pong/heartbeat necessary? What other ways brick/client can go bad?

On Thu, Jan 19, 2017 at 3:36 PM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:


On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <rkavunga@xxxxxxxxxx> wrote:
Hi,

The patch for priority based ping packets [1] are ready to review. As
Shyam mentioned in the comment on patch set 12, it doesn't solve the
problem with network conjunction nor the disk latency. Also it won't
priorities the reply of ping packets at the server end (We don't have a
straight way to identify prognum in the reply).


So my question , Is it worth of taking the patch or do we need to think
through a more generic solutions.

Though ping requests can take more time to reach server due to heavy traffic, realistically speaking common reasons for ping-timer expiry have been either

1. client not been able to read ping response [2]
2. server not able to read ping request.

Speaking about 2 above, Me, Kritika and Pranith were just discussing today morning about an issue where they had hit ping timer expiry in replicated setups when disk usage was high. The reason for this as Pranith pointed out was,
1. posix has some fops (like posix_xattrop, posix_fxattrop) which do syscalls after holding a lock on inode (inode->lock).
2. During high disk usage scenarios, syscall latencies were high (sometimes >= ping-timeout value)
3. Before being handed over to a new thread at io-threads xlator, a fop gets executed in one of the threads that reads incoming messages from socket. This execution path includes some translators like protocol/server, index, quota-enforcer, marker. And these translators might access inode-ctx which involves locking inode (inode->lock). Due to this locking latency of syscall gets transferred to poller thread. Since poller thread is waiting on inode->lock, it won't be able to read ping requests from network in-time resulting in ping-timer expiry.

I think Kritika is working on a patch to eliminate locking on inode in 1 above. We also need to reduce the actual fop execution in poller thread. IOW, we need to hand over the fop execution to io-threads/syncop-threads as early as we can. [3] helps in this scenario as it adds back the socket for polling immediately after reading the entire msg but before execution of fop begins. So, even though fop execution is happening in poller thread, msgs from same connection can be read in other poller threads parallely (and we can scale up the number of epoll-threads when load is high).

Also, note that there is no way we can send entire ping request as "URGENT" data over network. So, prioritization in [1] is only the queue of messages waiting to be written to network. So, Though I suggested [1], the more I think of it, it seems less irrelevant.


Note : We could make this patch more generic so that any packets can be
marked as priority to add into the head instead of just Ping packets.

[1] : http://review.gluster.org/#/c/11935/

Regards

Rafi KC

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel



--
Raghavendra G



--
Raghavendra G



--
Raghavendra G
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux