On Mon, Oct 24, 2016 at 11:29 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: Yan, Zheng [mailto:ukernel@xxxxxxxxx] >> Sent: 24 October 2016 10:19 >> To: Gregory Farnum <gfarnum@xxxxxxxxxx> >> Cc: Nick Fisk <nick@xxxxxxxxxx>; Zheng Yan <zyan@xxxxxxxxxx>; Ceph Users <ceph-users@xxxxxxxxxxxxxx> >> Subject: Re: Ceph and TCP States >> >> X-Assp-URIBL failed: 'ceph-users-ceph.com'(black.uribl.com ) >> X-Assp-Spam-Level: ***** >> X-Assp-Envelope-From: ukernel@xxxxxxxxx >> X-Assp-Intended-For: nick@xxxxxxxxxx >> X-Assp-ID: ASSP.fisk.me.uk (47730-03772) >> X-Assp-Version: 1.9.1.4(1.0.00) >> >> On Sat, Oct 22, 2016 at 4:14 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> > On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: >> >>> -----Original Message----- >> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On >> >>> Behalf Of Haomai Wang >> >>> Sent: 21 October 2016 15:40 >> >>> To: Nick Fisk <nick@xxxxxxxxxx> >> >>> Cc: ceph-users@xxxxxxxxxxxxxx >> >>> Subject: Re: Ceph and TCP States >> >>> >> >>> >> >>> >> >>> On Fri, Oct 21, 2016 at 10:31 PM, Nick Fisk <mailto:nick@xxxxxxxxxx> wrote: >> >>> > -----Original Message----- >> >>> > From: ceph-users [mailto:mailto:ceph-users-bounces@xxxxxxxxxxxxxx] >> >>> > On Behalf Of Haomai Wang >> >>> > Sent: 21 October 2016 15:28 >> >>> > To: Nick Fisk <mailto:nick@xxxxxxxxxx> >> >>> > Cc: mailto:ceph-users@xxxxxxxxxxxxxx >> >>> > Subject: Re: Ceph and TCP States >> >>> > >> >>> > >> >>> > >> >>> > On Fri, Oct 21, 2016 at 10:19 PM, Nick Fisk <mailto:mailto:nick@xxxxxxxxxx> wrote: >> >>> > Hi, >> >>> > >> >>> > I'm just testing out using a Ceph client in a DMZ behind a FW from >> >>> > the main Ceph cluster. One thing I have noticed is that if the >> >>> > state table on the FW is emptied maybe by restarting it or just clearing the state table...etc. Then the Ceph client will hang for a >> long time as the TCP session can no longer pass through the FW and just gets blocked instead. >> >>> > >> >>> > This "FW" is linux firewall or hardware FW? >> >>> >> >>> PFSense running on dedicated HW. Eventually they will be in a HA pair so states should persist, but trying to work around this for >> now. >> >>> Bit annoying having CephFS lock hard for 15 minutes even though the network connection only went down for a few seconds. >> >>> >> >>> hmm, I'm not familiar with this fw. And from my view, whether >> >>> RST packet sent is decided by FW. But I think you can try "/proc/sys/net/ipv4/tcp_keepalive_time", if FW reset tcp session, tcp >> keepalive should detect and send a rst. >> >> >> >> Yeah I think that’s where the problem lies. Most Firewalls tend to silently drop denied packets without sending RST's, so Ceph >> effectively just thinks that its experiencing packet loss and will never retry until the 15 minute timeout period is up. Am I right in >> thinking I can't tune down this parameter for a CephFS kernel client as it doesn't use the ceph.conf file? >> > >> > The kernel client has a lot of mount options and can be configured in >> > a few ways via debugfs et al; I think there's a setting for the >> > timeout as well. If you can't find it, I'm sure Zheng knows. :) -Greg >> >> So far, there is no mount option to control keepalive time for client-to-mds connection. > > I think, although can't be 100%, that most of the problem is around client<->mon traffic. I'm pretty sure I saw a timeout to one of the mons flash up on the screen just before everything sprung back into life. Which kernel is this? kernel client <-> mon session has a 30 second keepalive timeout in recent kernels. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com