Re: [nautilus][mds] MDS fall into ReadOnly mode

Frank Yu <flyxiaoyu@xxxxxxxxx> · Thu, 30 Jul 2020 19:09:34 +0800

I got some error from mds.log as below:

2020-07-30 18:14:38.574 7f6473346700  0 log_channel(cluster) log [WRN] : 94
slow requests, 0 included below; oldest blocked for > 619910.524984 secs
2020-07-30 18:14:43.574 7f6473346700  0 log_channel(cluster) log [WRN] : 94
slow requests, 0 included below; oldest blocked for > 619915.525079 secs
2020-07-30 18:14:44.835 7f646f33e700 -1 mds.0.159432 unhandled write error
(90) Message too long, force readonly...
2020-07-30 18:14:44.835 7f646f33e700  1 mds.0.cache force file system
read-only
2020-07-30 18:14:44.835 7f646f33e700  0 log_channel(cluster) log [WRN] :
force file system read-only
2020-07-30 18:15:18.000 7f6473346700  0 log_channel(cluster) log [WRN] :
114 slow requests, 5 included below; oldest blocked for > 619949.950199 secs

On Thu, Jul 30, 2020 at 6:55 PM Frank Yu <flyxiaoyu@xxxxxxxxx> wrote:

> Hi guys,
>
> I have a ceph cluster with three MDS servers, two of them in active
> status, while the left one is in standby-replay mode. Today I found the
> message '1 MDSs are read only' show up when check the cluster status with
> 'ceph -s', details as below:
>
> # ceph -s
>   cluster:
>     id:     3d43e9a5-50dc-4f84-9493-656bf4f06f8c
>     health: HEALTH_WARN
>             5 clients failing to advance oldest client/flush tid
>             1 MDSs are read only
>             2 MDSs report slow requests
>             2 MDSs behind on trimming
>             BlueFS spillover detected on 33 OSD(s)
>
>   services:
>     mon: 3 daemons, quorum bjcpu-001,bjcpu-002,bjcpu-003 (age 3M)
>     mgr: bjcpu-001.xxxx.io(active, since 3M), standbys: bjcpu-003.xxxx.io,
> bjcpu-002.xxxx.io
>     mds: cephfs:2 {0=bjcpu-003.xxxx.io=up:active,1=bjcpu-001.xxxx.io=up:active}
> 1 up:standby-replay
>     osd: 48 osds: 48 up (since 7w), 48 in (since 7M)
>
>   data:
>     pools:   3 pools, 2304 pgs
>     objects: 301.35M objects, 70 TiB
>     usage:   246 TiB used, 280 TiB / 527 TiB avail
>     pgs:     2295 active+clean
>              9    active+clean+scrubbing+deep
>
>   io:
>     client:   254 B/s rd, 44 MiB/s wr, 0 op/s rd, 15 op/s wr
>
> What should I do to fix the error message? it seems the cluster still
> works fine(can read and write).
>
> Many thanks
>
>
> --
> Regards
> Frank Yu
>

-- 
Regards
Frank Yu
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx