Re: [nautilus][mds] MDS fall into ReadOnly mode

Frank Yu <flyxiaoyu@xxxxxxxxx> · Thu, 30 Jul 2020 19:52:31 +0800

No, I think I should get the reason why it fall into read only, and found
the correct method to fix it.... Just restart the mds is dangerous, I think.

On Thu, Jul 30, 2020 at 7:35 PM sathvik vutukuri <7vik.sathvik@xxxxxxxxx>
wrote:

> Have you tried restart of the MDS.
>
> On Thu, 30 Jul 2020, 16:40 Frank Yu, <flyxiaoyu@xxxxxxxxx> wrote:
>
>> I got some error from mds.log as below:
>>
>> 2020-07-30 18:14:38.574 7f6473346700  0 log_channel(cluster) log [WRN] :
>> 94
>> slow requests, 0 included below; oldest blocked for > 619910.524984 secs
>> 2020-07-30 18:14:43.574 7f6473346700  0 log_channel(cluster) log [WRN] :
>> 94
>> slow requests, 0 included below; oldest blocked for > 619915.525079 secs
>> 2020-07-30 18:14:44.835 7f646f33e700 -1 mds.0.159432 unhandled write error
>> (90) Message too long, force readonly...
>> 2020-07-30 18:14:44.835 7f646f33e700  1 mds.0.cache force file system
>> read-only
>> 2020-07-30 18:14:44.835 7f646f33e700  0 log_channel(cluster) log [WRN] :
>> force file system read-only
>> 2020-07-30 18:15:18.000 7f6473346700  0 log_channel(cluster) log [WRN] :
>> 114 slow requests, 5 included below; oldest blocked for > 619949.950199
>> secs
>>
>>
>> On Thu, Jul 30, 2020 at 6:55 PM Frank Yu <flyxiaoyu@xxxxxxxxx> wrote:
>>
>> > Hi guys,
>> >
>> > I have a ceph cluster with three MDS servers, two of them in active
>> > status, while the left one is in standby-replay mode. Today I found the
>> > message '1 MDSs are read only' show up when check the cluster status
>> with
>> > 'ceph -s', details as below:
>> >
>> > # ceph -s
>> >   cluster:
>> >     id:     3d43e9a5-50dc-4f84-9493-656bf4f06f8c
>> >     health: HEALTH_WARN
>> >             5 clients failing to advance oldest client/flush tid
>> >             1 MDSs are read only
>> >             2 MDSs report slow requests
>> >             2 MDSs behind on trimming
>> >             BlueFS spillover detected on 33 OSD(s)
>> >
>> >   services:
>> >     mon: 3 daemons, quorum bjcpu-001,bjcpu-002,bjcpu-003 (age 3M)
>> >     mgr: bjcpu-001.xxxx.io(active, since 3M), standbys:
>> bjcpu-003.xxxx.io,
>> > bjcpu-002.xxxx.io
>> >     mds: cephfs:2 {0=bjcpu-003.xxxx.io=up:active,1=bjcpu-001.xxxx.io
>> =up:active}
>> > 1 up:standby-replay
>> >     osd: 48 osds: 48 up (since 7w), 48 in (since 7M)
>> >
>> >   data:
>> >     pools:   3 pools, 2304 pgs
>> >     objects: 301.35M objects, 70 TiB
>> >     usage:   246 TiB used, 280 TiB / 527 TiB avail
>> >     pgs:     2295 active+clean
>> >              9    active+clean+scrubbing+deep
>> >
>> >   io:
>> >     client:   254 B/s rd, 44 MiB/s wr, 0 op/s rd, 15 op/s wr
>> >
>> > What should I do to fix the error message? it seems the cluster still
>> > works fine(can read and write).
>> >
>> > Many thanks
>> >
>> >
>> > --
>> > Regards
>> > Frank Yu
>> >
>>
>>
>> --
>> Regards
>> Frank Yu
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>

-- 
Regards
Frank Yu
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx