No, I think I should get the reason why it fall into read only, and found the correct method to fix it.... Just restart the mds is dangerous, I think. On Thu, Jul 30, 2020 at 7:35 PM sathvik vutukuri <7vik.sathvik@xxxxxxxxx> wrote: > Have you tried restart of the MDS. > > On Thu, 30 Jul 2020, 16:40 Frank Yu, <flyxiaoyu@xxxxxxxxx> wrote: > >> I got some error from mds.log as below: >> >> 2020-07-30 18:14:38.574 7f6473346700 0 log_channel(cluster) log [WRN] : >> 94 >> slow requests, 0 included below; oldest blocked for > 619910.524984 secs >> 2020-07-30 18:14:43.574 7f6473346700 0 log_channel(cluster) log [WRN] : >> 94 >> slow requests, 0 included below; oldest blocked for > 619915.525079 secs >> 2020-07-30 18:14:44.835 7f646f33e700 -1 mds.0.159432 unhandled write error >> (90) Message too long, force readonly... >> 2020-07-30 18:14:44.835 7f646f33e700 1 mds.0.cache force file system >> read-only >> 2020-07-30 18:14:44.835 7f646f33e700 0 log_channel(cluster) log [WRN] : >> force file system read-only >> 2020-07-30 18:15:18.000 7f6473346700 0 log_channel(cluster) log [WRN] : >> 114 slow requests, 5 included below; oldest blocked for > 619949.950199 >> secs >> >> >> On Thu, Jul 30, 2020 at 6:55 PM Frank Yu <flyxiaoyu@xxxxxxxxx> wrote: >> >> > Hi guys, >> > >> > I have a ceph cluster with three MDS servers, two of them in active >> > status, while the left one is in standby-replay mode. Today I found the >> > message '1 MDSs are read only' show up when check the cluster status >> with >> > 'ceph -s', details as below: >> > >> > # ceph -s >> > cluster: >> > id: 3d43e9a5-50dc-4f84-9493-656bf4f06f8c >> > health: HEALTH_WARN >> > 5 clients failing to advance oldest client/flush tid >> > 1 MDSs are read only >> > 2 MDSs report slow requests >> > 2 MDSs behind on trimming >> > BlueFS spillover detected on 33 OSD(s) >> > >> > services: >> > mon: 3 daemons, quorum bjcpu-001,bjcpu-002,bjcpu-003 (age 3M) >> > mgr: bjcpu-001.xxxx.io(active, since 3M), standbys: >> bjcpu-003.xxxx.io, >> > bjcpu-002.xxxx.io >> > mds: cephfs:2 {0=bjcpu-003.xxxx.io=up:active,1=bjcpu-001.xxxx.io >> =up:active} >> > 1 up:standby-replay >> > osd: 48 osds: 48 up (since 7w), 48 in (since 7M) >> > >> > data: >> > pools: 3 pools, 2304 pgs >> > objects: 301.35M objects, 70 TiB >> > usage: 246 TiB used, 280 TiB / 527 TiB avail >> > pgs: 2295 active+clean >> > 9 active+clean+scrubbing+deep >> > >> > io: >> > client: 254 B/s rd, 44 MiB/s wr, 0 op/s rd, 15 op/s wr >> > >> > What should I do to fix the error message? it seems the cluster still >> > works fine(can read and write). >> > >> > Many thanks >> > >> > >> > -- >> > Regards >> > Frank Yu >> > >> >> >> -- >> Regards >> Frank Yu >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > -- Regards Frank Yu _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx