Re: Ceph and TCP States

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 24 Oct 2016 15:44:51 +0200

On Mon, Oct 24, 2016 at 11:50 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> -----Original Message-----
>> From: Ilya Dryomov [mailto:idryomov@xxxxxxxxx]
>> Sent: 24 October 2016 10:33
>> To: Nick Fisk <nick@xxxxxxxxxx>
>> Cc: Yan, Zheng <ukernel@xxxxxxxxx>; Gregory Farnum <gfarnum@xxxxxxxxxx>; Zheng Yan <zyan@xxxxxxxxxx>; Ceph Users <ceph-
>> users@xxxxxxxxxxxxxx>
>> Subject: Re:  Ceph and TCP States
>>
>> On Mon, Oct 24, 2016 at 11:29 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> -----Original Message-----
>> >> From: Yan, Zheng [mailto:ukernel@xxxxxxxxx]
>> >> Sent: 24 October 2016 10:19
>> >> To: Gregory Farnum <gfarnum@xxxxxxxxxx>
>> >> Cc: Nick Fisk <nick@xxxxxxxxxx>; Zheng Yan <zyan@xxxxxxxxxx>; Ceph
>> >> Users <ceph-users@xxxxxxxxxxxxxx>
>> >> Subject: Re:  Ceph and TCP States
>> >>
>> >> X-Assp-URIBL failed: 'ceph-users-ceph.com'(black.uribl.com )
>> >> X-Assp-Spam-Level: *****
>> >> X-Assp-Envelope-From: ukernel@xxxxxxxxx
>> >> X-Assp-Intended-For: nick@xxxxxxxxxx
>> >> X-Assp-ID: ASSP.fisk.me.uk (47730-03772)
>> >> X-Assp-Version: 1.9.1.4(1.0.00)
>> >>
>> >> On Sat, Oct 22, 2016 at 4:14 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> >> > On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> >>> -----Original Message-----
>> >> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
>> >> >>> Behalf Of Haomai Wang
>> >> >>> Sent: 21 October 2016 15:40
>> >> >>> To: Nick Fisk <nick@xxxxxxxxxx>
>> >> >>> Cc: ceph-users@xxxxxxxxxxxxxx
>> >> >>> Subject: Re:  Ceph and TCP States
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Fri, Oct 21, 2016 at 10:31 PM, Nick Fisk <mailto:nick@xxxxxxxxxx> wrote:
>> >> >>> > -----Original Message-----
>> >> >>> > From: ceph-users
>> >> >>> > [mailto:mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
>> >> >>> > On Behalf Of Haomai Wang
>> >> >>> > Sent: 21 October 2016 15:28
>> >> >>> > To: Nick Fisk <mailto:nick@xxxxxxxxxx>
>> >> >>> > Cc: mailto:ceph-users@xxxxxxxxxxxxxx
>> >> >>> > Subject: Re:  Ceph and TCP States
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > On Fri, Oct 21, 2016 at 10:19 PM, Nick Fisk <mailto:mailto:nick@xxxxxxxxxx> wrote:
>> >> >>> > Hi,
>> >> >>> >
>> >> >>> > I'm just testing out using a Ceph client in a DMZ behind a FW
>> >> >>> > from the main Ceph cluster. One thing I have noticed is that if
>> >> >>> > the state table on the FW is emptied maybe by restarting it or
>> >> >>> > just clearing the state table...etc. Then the Ceph client will
>> >> >>> > hang for a
>> >> long time as the TCP session can no longer pass through the FW and just gets blocked instead.
>> >> >>> >
>> >> >>> > This "FW" is linux firewall or hardware FW?
>> >> >>>
>> >> >>> PFSense running on dedicated HW. Eventually they will be in a HA
>> >> >>> pair so states should persist, but trying to work around this for
>> >> now.
>> >> >>> Bit annoying having CephFS lock hard for 15 minutes even though the network connection only went down for a few seconds.
>> >> >>>
>> >> >>>     hmm, I'm not familiar with this fw. And from my view, whether
>> >> >>> RST packet sent is decided by FW. But I think you can try
>> >> >>> "/proc/sys/net/ipv4/tcp_keepalive_time", if FW reset tcp session,
>> >> >>> tcp
>> >> keepalive should detect and send a rst.
>> >> >>
>> >> >> Yeah I think that’s where the problem lies. Most Firewalls tend to
>> >> >> silently drop denied packets without sending RST's, so Ceph
>> >> effectively just thinks that its experiencing packet loss and will
>> >> never retry until the 15 minute timeout period is up. Am I right in thinking I can't tune down this parameter for a CephFS kernel
>> client as it doesn't use the ceph.conf file?
>> >> >
>> >> > The kernel client has a lot of mount options and can be configured
>> >> > in a few ways via debugfs et al; I think there's a setting for the
>> >> > timeout as well. If you can't find it, I'm sure Zheng knows. :)
>> >> > -Greg
>> >>
>> >> So far, there is no mount option to control keepalive time for client-to-mds connection.
>> >
>> > I think, although can't be 100%, that most of the problem is around client<->mon traffic. I'm pretty sure I saw a timeout to one of the
>> mons flash up on the screen just before everything sprung back into life.
>>
>> Which kernel is this?  kernel client <-> mon session has a 30 second keepalive timeout in recent kernels.
>
> Kernel is 4.8.
>
> I'm certainly not seeing connectivity come back in 30 seconds, can't be sure on the 15 minutes I stated above, but it's around that figure. I also don't see any new TCP sessions established on the firewall, so it doesn't look like it's trying to establish a new TCP connection after 30s either. A reboot of the client is currently the fastest way to get everything working again.

Do you see anything in /sys/kernel/debug/ceph/<fsid>/osdc or mdsc when
this happens?

Can you try reproducing with krbd?  If it's a mon session problem, it
should behave the same...

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com