Stuck in replay?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We have a somewhat serious situation where we have a cephfs filesystem (18.2.1), and 2 active MDSs (one standby). ThI tried to restart one of the active daemons to unstick a bunch of blocked requests, and the standby went into 'replay' for a very long time, then RAM on that MDS server filled up, and it just stayed there for a while then eventually appeared to give up and switched to the standby, but the cycle started again. So I restarted that MDS, and now I'm in a situation where I see this:

# ceph fs status
slugfs - 29 clients
======
RANK   STATE            MDS            ACTIVITY   DNS    INOS   DIRS   CAPS
 0     replay  slugfs.pr-md-01.xdtppo            3958k  57.1k  12.2k     0
 1    resolve  slugfs.pr-md-02.sbblqq               0      3      1      0
       POOL           TYPE     USED  AVAIL
 cephfs_metadata    metadata   997G  2948G
cephfs_md_and_data    data       0   87.6T
   cephfs_data        data     773T   175T
     STANDBY MDS
slugfs.pr-md-03.mclckv
MDS version: ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)

It just stays there indefinitely. All my clients are hung. I tried restarting all MDS daemons and they just went back to this state after coming back up.

Is there any way I can somehow escape this state of indefinite replay/resolve?

Thanks so much! I'm kinda nervous since none of my clients have filesystem access at the moment...

cheers,
erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux