Re: ceph filesystem stuck in read only

Ramana Krisna Venkatesh Raja <rraja@xxxxxxxxxx> · Fri, 4 Nov 2022 18:10:50 -0400

On Fri, Nov 4, 2022 at 9:36 AM Galzin Rémi <rgalzin@xxxxxxxxxx> wrote:
>
>
> Hi,
> i'm looking for some help/ideas/advices in order to solve the problem
> that occurs on my metadata
> server after the server reboot.

You rebooted a MDS's host and your file system became read-only? Was
the Ceph cluster healthy before reboot? Any issues with the MDSs,
OSDs? Did this happen after an upgrade?

> "Ceph status" warns about my MDS being "read only" but the fileystem and
> the data seem healthy.
> It is still possible to access the content of my cephfs volumes since
> it's read only but i don't know how
> to make my filesystem writable again.
>
> Logs keeps showing the same error when i restart the MDS server :
>
> 2022-11-04T11:50:14.506+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map
> state change up:reconnect --> up:rejoin
> 2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872 rejoin_start
> 2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872
> rejoin_joint_start
> 2022-11-04T11:50:14.702+0100 7fbbf83c2700  1 mds.0.6872 rejoin_done
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.node3-5 Updating MDS
> map to version 6881 from mon.3
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map i
> am now mds.0.6872
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map
> state change up:rejoin --> up:active
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 recovery_done --
> successful recovery!
> 2022-11-04T11:50:15.550+0100 7fbbf83c2700  1 mds.0.6872 active_start
> 2022-11-04T11:50:15.558+0100 7fbbf83c2700  1 mds.0.6872 cluster
> recovered.
> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging:
> rank=0 was never sent ping request.
> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging:
> rank=1 was never sent ping request.
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  1
> mds.0.cache.dir(0x1000006cf14) commit error -22 v 1933183
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log
> [ERR] : failed to commit dir 0x1000006cf14 object, errno -22
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write
> error (22) Invalid argument, force readonly...
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  1 mds.0.cache force file
> system read-only

The MDS is unable to write a metadata object to the OSD.  Set
debug_mds=20 and debug_objecter=20 for the MDS, and capture the MDS
logs when this happens for more details.
e.g.,
$ ceph config set mds.<your-MDS-ID> debug_mds 20

Also, check the OSD logs when you're hitting this issue.

You can then reset the MDS log level.  You can share the relevant MDS
and OSD logs using,
https://docs.ceph.com/en/pacific/man/8/ceph-post-file/

> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  0 log_channel(cluster) log
> [WRN] : force file system read-only
>
> More info:
>
>    cluster:
>      id:     f36b996f-221d-4bcb-834b-19fc20bcad6b
>      health: HEALTH_WARN
>              1 MDSs are read only
>              1 MDSs behind on trimming
>
>    services:
>      mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age
> 22h)
>      mgr: node2-4(active, since 28h), standbys: node2-5, node3-4,
> node3-5, node1-1
>      mds: 3/3 daemons up, 3 standby
>      osd: 112 osds: 112 up (since 22h), 112 in (since 2w)
>
>    data:
>      volumes: 2/2 healthy
>      pools:   12 pools, 529 pgs
>      objects: 8.54M objects, 1.9 TiB
>      usage:   7.8 TiB used, 38 TiB / 46 TiB avail
>      pgs:     491 active+clean
>               29  active+clean+snaptrim
>               9   active+clean+snaptrim_wait
>
> All MDSs, MONs and OSDs are in version 16.2.9.
>

What are the outputs of `ceph fs status` and `ceph fs dump`?

-Ramana

> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx