MDS Read-Only state in production CephFS

Brady Deetz <bdeetz@xxxxxxxxx> · Tue, 28 Mar 2017 11:54:21 -0500

Running Jewel 10.2.5 on my production cephfs cluster and came into this ceph status

[ceph-admin@mds1 brady]$ ceph status
    cluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10
     health HEALTH_WARN
            mds0: Behind on trimming (2718/30)
            mds0: MDS in read-only mode
     monmap e17: 5 mons at {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0,mon2=10.124.103.62:6789/0,osd2=10.124.103.72:6789/0,osd3=10.124.103.73:6789/0}
            election epoch 378, quorum 0,1,2,3,4 mon0,mon1,mon2,osd2,osd3
      fsmap e6817: 1/1/1 up {0=mds0=up:active}, 1 up:standby
     osdmap e172126: 235 osds: 235 up, 235 in
            flags sortbitwise,require_jewel_osds
      pgmap v18008949: 5696 pgs, 2 pools, 291 TB data, 112 Mobjects
            874 TB used, 407 TB / 1282 TB avail
                5670 active+clean
                  13 active+clean+scrubbing+deep
                  13 active+clean+scrubbing
  client io 760 B/s rd, 0 op/s rd, 0 op/s wr

I've tried rebooting both mds servers. I've started a rolling reboot across all of my osd nodes, but each node takes about 10 minutes fully rejoin. so it's going to take a while. Any recommendations other than reboot?

Attached are my mds logs during the failure.

Any ideas?
Attachment:
mds0

Description: Binary data
Attachment:
mds1

Description: Binary data
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com