Re: [PATCH] ceph: reconnect connection if session hang in opening state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 28, 2019 at 8:05 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Wed, 2019-08-28 at 17:48 +0800, chenerqi@xxxxxxxxx wrote:
> > From: Erqi Chen <chenerqi@xxxxxxxxx>
> >
> > If client mds session is evicted in CEPH_MDS_SESSION_OPENING state,
> > mds won't send session msg to client, and delayed_work skip
> > CEPH_MDS_SESSION_OPENING state session, the session hang forever.
> > ceph_con_keepalive reconnct connection for CEPH_MDS_SESSION_OPENING
> > session to avoid session hang.
> >
> > Fixes: https://tracker.ceph.com/issues/41551
> > Signed-off-by: Erqi Chen chenerqi@xxxxxxxxx
> > ---
> >  fs/ceph/mds_client.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 920e9f0..eee4b63 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -4044,7 +4044,7 @@ static void delayed_work(struct work_struct *work)
> >                               pr_info("mds%d hung\n", s->s_mds);
> >                       }
> >               }
> > -             if (s->s_state < CEPH_MDS_SESSION_OPEN) {
> > +             if (s->s_state < CEPH_MDS_SESSION_OPENING) {
> >                       /* this mds is failed or recovering, just wait */
> >                       ceph_put_mds_session(s);
> >                       continue;
>
> Just for my own edification:
>
> OPENING == we've sent (or are sending) the session open request
> OPEN == we've gotten the reply from the MDS and it was successful
>
> So in this case, the client got blacklisted after sending the request
> but before the reply? Ok.
>
> So this should make it send a keepalive (or cap) message, at which point
> the client discovers the connection is closed and then goes to reconnect
> the session. This sounds sane to me, but I wonder if this would be
> better expressed as:
>
>     if (s->s_state == CEPH_MDS_SESSION_NEW)
>

should also avoid keepalive for CEPH_MDS_SESSION_RESTARTING and
CEPH_MDS_SESSION_REJECTED



> It always seems odd to me that we rely on the numerical values in this
> enum. That said, we do that all over the code, so I'm inclined to just
> merge this as-is (assuming Zheng concurs).
>
> --
> Jeff Layton <jlayton@xxxxxxxxxx>
>



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux