Re: Intermittent client reconnect delay following node fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 23, 2018 at 3:01 PM William Lawton
<william.lawton@xxxxxxxxxx> wrote:
>
> Hi John.
>
> Just picking up this thread again after coming back from leave. Our ceph storage project has progressed and we are now making sure that the active MON and MDS are kept on separate nodes which has helped reduce the incidence of delayed client reconnects on ceph node failure. We've also disabled client blacklisting which has prevented late clients from being permanently disconnected. However, we still have occasional slow client reconnects if we lose the active MON and MDS nodes at the same time (i.e. an AWS AZ failure scenario). We would love to irradiate these slow reconnects entirely ideally. One other thing we've noticed with our resiliency tests is that when we bring down a MON node, there is always a MON re-election triggered, even if the stopped MON node was not the leader. Do you know if there is a way to configure ceph so that there is only a MON re-election if the current MON leader is lost?

Hmm, I'm not sure exactly what the bounds are meant to be on how long
the mon cluster takes to recover from a peon failure.  However, if the
elections are taking an unreasonably long time, that would certainly
be a viable explanation for the strange reconnect behaviour -- if the
FSMap is being updated, and most clients see it, but a few don't see
it until after an election perhaps.

John

>
> Thanks
>
> William Lawton
>
> -----Original Message-----
> From: William Lawton
> Sent: Wednesday, August 01, 2018 2:05 PM
> To: 'John Spray' <jspray@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx; Mark Standley <Mark.Standley@xxxxxxxxxx>
> Subject: RE:  Intermittent client reconnect delay following node fail
>
> I didn't lose any clients this time around, all clients reconnected within at most 21 seconds. We think the very long client disconnections occurred when both the mgr and mds were active on the failed node, which was not the case for any of my recent 10 tests. We have noticed in the client logs like the following:
>
> Aug  1 10:39:06 dub-ditv-sim-goldenimage kernel: libceph: mon0 10.18.49.35:6789 session lost, hunting for new mon
>
> We're currently exploring whether keeping the mds and mon daemons on separate servers has less impact on the client when either one is lost.
>
> William Lawton
>
> -----Original Message-----
> From: John Spray <jspray@xxxxxxxxxx>
> Sent: Wednesday, August 01, 2018 1:14 PM
> To: William Lawton <william.lawton@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx; Mark Standley <Mark.Standley@xxxxxxxxxx>
> Subject: Re:  Intermittent client reconnect delay following node fail
>
> On Wed, Aug 1, 2018 at 12:09 PM William Lawton <william.lawton@xxxxxxxxxx> wrote:
> >
> > Thanks for the advice John.
> >
> > Our CentOS 7 clients use linux kernel v3.10 so I upgraded one of them to use v4.17 and have run 10 more node fail tests. Unfortunately, the kernel upgrade on the client hasn't resolved the issue.
> >
> > With each test I took down the active MDS node and monitored how long the two v3.10 clients and the v4.17 client lost the ceph mount for. There wasn't much difference between them i.e. the v3.10 clients lost the mount for between 0 and 21 seconds and the v4.17 client for between 0 and 16 seconds. Sometimes each node lost the mount at different times i.e. seconds apart. Other times, 2 nodes would lose and recover the mount at exactly the same time and the third node would lose/recover some time later.
> >
> > We are novices with Ceph so are not really sure what we should expect from it regarding resilience i.e. is it normal for clients to lose the mount point for a period of time and if so, how long should we consider an abnormal period.
>
> So with the more recent kernel you're finding the clients do reliably reconnect, there's just some variation in the time it takes?  Or are you still losing some clients entirely?
>
> John
>
>
> >
> > William Lawton
> >
> > -----Original Message-----
> > From: John Spray <jspray@xxxxxxxxxx>
> > Sent: Tuesday, July 31, 2018 11:17 AM
> > To: William Lawton <william.lawton@xxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx; Mark Standley
> > <Mark.Standley@xxxxxxxxxx>
> > Subject: Re:  Intermittent client reconnect delay
> > following node fail
> >
> > On Tue, Jul 31, 2018 at 12:33 AM William Lawton <william.lawton@xxxxxxxxxx> wrote:
> > >
> > > Hi.
> > >
> > >
> > >
> > > We have recently setup our first ceph cluster (4 nodes) but our node failure tests have revealed an intermittent problem. When we take down a node (i.e. by powering it off) most of the time all clients reconnect to the cluster within milliseconds, but occasionally it can take them 30 seconds or more. All clients are Centos7 instances and have the ceph cluster mount point configured in /etc/fstab as follows:
> >
> > The first thing I'd do is make sure you've got recent client code --
> > there are backports in RHEL but I'm unclear on how much of that (if
> > any) makes it into centos.  You may find it simpler to just install a recent 4.x kernel from ELRepo.  Even if you don't want to use that in production, it would be useful to try and isolate any CephFS client issues you're encountering.
> >
> > John
> >
> > >
> > >
> > >
> > > 10.18.49.35:6789,10.18.49.204:6789,10.18.49.101:6789,10.18.49.183:6789:/ /mnt/ceph ceph name=admin,secretfile=/etc/ceph_key,noatime,_netdev    0       2
> > >
> > >
> > >
> > > On rare occasions, using the ls command, we can see that a failover has left a client’s /mnt/ceph directory with the following state: “???????????  ? ?    ?       ?            ? ceph”. When this occurs, we think that the client has failed to connect within 45 seconds (the mds_reconnect_timeout period) so the client has been evicted. We can reproduce this circumstance by reducing the mds reconnect timeout down to 1 second.
> > >
> > >
> > >
> > > We’d like to know why our clients sometimes struggle to reconnect after a cluster node failure and how to prevent this i.e. how can we ensure that all clients consistently reconnect to the cluster quickly following a node failure.
> > >
> > >
> > >
> > > We are using the default configuration options.
> > >
> > >
> > >
> > > Ceph Status:
> > >
> > >
> > >
> > >   cluster:
> > >
> > >     id:     ea2d9095-3deb-4482-bf6c-23229c594da4
> > >
> > >     health: HEALTH_OK
> > >
> > >
> > >
> > >   services:
> > >
> > >     mon: 4 daemons, quorum
> > > dub-ceph-01,dub-ceph-03,dub-ceph-04,dub-ceph-02
> > >
> > >     mgr: dub-ceph-02(active), standbys: dub-ceph-04.ott.local,
> > > dub-ceph-01, dub-ceph-03
> > >
> > >     mds: cephfs-1/1/1 up  {0=dub-ceph-03=up:active}, 3 up:standby
> > >
> > >     osd: 4 osds: 4 up, 4 in
> > >
> > >
> > >
> > >   data:
> > >
> > >     pools:   2 pools, 200 pgs
> > >
> > >     objects: 2.36 k objects, 8.9 GiB
> > >
> > >     usage:   31 GiB used, 1.9 TiB / 2.0 TiB avail
> > >
> > >     pgs:     200 active+clean
> > >
> > >
> > >
> > > Thanks
> > >
> > > William Lawton
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux