Hi Ramana and thank you, yes, before the MDS's host reboot the filesystem was read+write and the cluster was just fine too. We haven't made any upgrade since the cluster has been installed. Some times ago i had to rebuild 6 OSDs, due to start failure at boot time. No more troubles since. _ What are the outputs of `ceph fs status` and `ceph fs dump`?_ root@node3-4:~# ceph fs status cephfs-ssdrep - 12 clients ============= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active node3-5 Reqs: 0 /s 26.7k 26.7k 6652 36 1 active node2-5 Reqs: 0 /s 21.3k 11.0k 1348 5 POOL TYPE USED AVAIL cephfs_ssdrep_metadata metadata 147G 8533G cephfs_ssdrep_data data 1089G 8533G cephfs-hdd - 14 clients ========== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active node2-3 Reqs: 15 /s 1375k 1375k 6867 1092k POOL TYPE USED AVAIL cephfs_hdd_metadata metadata 21.7G 8533G cephfs_hdd_data data 1484G 3231G STANDBY MDS node3-3 node2-4 node3-4 MDS version: ceph version 16.2.9 (a569859f5e07da0c4c39da81d5fb5675cd95da49) pacific (stable) and ceph fs dump : root@node3-4:~# ceph fs dump e21896 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: -1 Filesystem 'cephfs-ssdrep' (6) fs_name cephfs-ssdrep epoch 21896 flags 12 created 2022-08-04T10:26:26.821650+0200 modified 2022-11-07T13:45:37.711273+0100 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 917368 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 2 in 0,1 up {0=40733945,1=38718387} failed damaged stopped data_pools [28] metadata_pool 29 inline_data disabled balancer standby_count_wanted 1 [mds.node3-5{0:40733945} state up:active seq 71 join_fscid=6 addr [v2:192.168.33.13:6800/976767838,v1:192.168.33.13:6801/976767838] compat {c=[1],r=[1],i=[7ff]}] [mds.node2-5{1:38718387} state up:active seq 9 join_fscid=6 addr [v2:192.168.32.13:6800/155458907,v1:192.168.32.13:6801/155458907] compat {c=[1],r=[1],i=[7ff]}] Filesystem 'cephfs-hdd' (8) fs_name cephfs-hdd epoch 21767 flags 12 created 2022-10-25T14:05:21.065421+0200 modified 2022-11-07T08:38:14.283567+0100 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features {} last_failure 0 last_failure_osd_epoch 0 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=32528084} failed damaged stopped 1 data_pools [40] metadata_pool 41 inline_data disabled balancer standby_count_wanted 1 [mds.node2-3{0:32528084} state up:active seq 276773 join_fscid=8 addr [v2:192.168.32.10:6840/1960412605,v1:192.168.32.10:6841/1960412605] compat {c=[1],r=[1],i=[7ff]}] Standby daemons: [mds.node3-3{-1:38291353} state up:standby seq 1 join_fscid=8 addr [v2:192.168.33.10:6800/2462925236,v1:192.168.33.10:6801/2462925236] compat {c=[1],r=[1],i=[7ff]}] [mds.node2-4{-1:38315566} state up:standby seq 1 addr [v2:192.168.32.12:6800/1553911071,v1:192.168.32.12:6801/1553911071] compat {c=[1],r=[1],i=[7ff]}] [mds.node3-4{-1:40440312} state up:standby seq 1 addr [v2:192.168.33.12:6800/706792986,v1:192.168.33.12:6801/706792986] compat {c=[1],r=[1],i=[7ff]}] I raised the logs verbosity as you recommanded and restarted the MDS. It is still falling into a "read only" state and i'm now trying to sort what is relevant or not. (or sending all throught the ceph-post-file tool) Thanks again for the help. --- RÉMI GALZIN Administrateur Système Linux 57 boulevard Malesherbes, 75008 Paris FIXE : 0173503380 [1] MAIL : rgalzin@xxxxxxxxxx Ce message et ses éventuelles pièces jointes peuvent contenir des informations confidentielles et sont exclusivement adressées au destinataire(s) mentionné(s) ci-dessus. Toute diffusion, exploitation ou copie totale ou partielle, sans autorisation, de ce message et/ou de ses pièces jointes est strictement interdite. Si vous recevez ce message par erreur, merci de le détruire et d'avertir immédiatement l'expéditeur. Les entités du groupe ILIAD et l'expéditeur de ce message déclinent toute responsabilité si ce message a été modifié ou falsifié et ne pourront être tenus responsables des éventuels virus ou de son détournement par un tiers. The content of this email and the attached documents are confidential. They are exclusively addressed to the recipient. If this message is not intented for you, or if you have received it by mistake, and in order not to break the confidentiality of correspondence, you must not send it to someone else nor reproduce it. Please send it back to the sender or destroy it. Warning : the organization of the sender's message cannot be held responsible for a potential deformation of this email. It is up to the recipient to check that the messages and attached documents do not contain a virus. All opinions contained in this email and its attached documents are those of the sender. They do no represent the position of the organization unless it is stated differently in this email. Le 2022-11-04 23:10, Ramana Krisna Venkatesh Raja a écrit : > On Fri, Nov 4, 2022 at 9:36 AM Galzin Rémi <rgalzin@xxxxxxxxxx> wrote: > >> Hi, >> i'm looking for some help/ideas/advices in order to solve the problem >> that occurs on my metadata >> server after the server reboot. > > You rebooted a MDS's host and your file system became read-only? Was > the Ceph cluster healthy before reboot? Any issues with the MDSs, > OSDs? Did this happen after an upgrade? > >> "Ceph status" warns about my MDS being "read only" but the fileystem and >> the data seem healthy. >> It is still possible to access the content of my cephfs volumes since >> it's read only but i don't know how >> to make my filesystem writable again. >> >> Logs keeps showing the same error when i restart the MDS server : >> >> 2022-11-04T11:50:14.506+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map >> state change up:reconnect --> up:rejoin >> 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 rejoin_start >> 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 >> rejoin_joint_start >> 2022-11-04T11:50:14.702+0100 7fbbf83c2700 1 mds.0.6872 rejoin_done >> 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.node3-5 Updating MDS >> map to version 6881 from mon.3 >> 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map i >> am now mds.0.6872 >> 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map >> state change up:rejoin --> up:active >> 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 recovery_done -- >> successful recovery! >> 2022-11-04T11:50:15.550+0100 7fbbf83c2700 1 mds.0.6872 active_start >> 2022-11-04T11:50:15.558+0100 7fbbf83c2700 1 mds.0.6872 cluster >> recovered. >> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: >> rank=0 was never sent ping request. >> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: >> rank=1 was never sent ping request. >> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 >> mds.0.cache.dir(0x1000006cf14) commit error -22 v 1933183 >> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log >> [ERR] : failed to commit dir 0x1000006cf14 object, errno -22 >> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write >> error (22) Invalid argument, force readonly... >> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 mds.0.cache force file >> system read-only > > The MDS is unable to write a metadata object to the OSD. Set > debug_mds=20 and debug_objecter=20 for the MDS, and capture the MDS > logs when this happens for more details. > e.g., > $ ceph config set mds.<your-MDS-ID> debug_mds 20 > > Also, check the OSD logs when you're hitting this issue. > > You can then reset the MDS log level. You can share the relevant MDS > and OSD logs using, > https://docs.ceph.com/en/pacific/man/8/ceph-post-file/ > >> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 0 log_channel(cluster) log >> [WRN] : force file system read-only >> >> More info: >> >> cluster: >> id: f36b996f-221d-4bcb-834b-19fc20bcad6b >> health: HEALTH_WARN >> 1 MDSs are read only >> 1 MDSs behind on trimming >> >> services: >> mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age >> 22h) >> mgr: node2-4(active, since 28h), standbys: node2-5, node3-4, >> node3-5, node1-1 >> mds: 3/3 daemons up, 3 standby >> osd: 112 osds: 112 up (since 22h), 112 in (since 2w) >> >> data: >> volumes: 2/2 healthy >> pools: 12 pools, 529 pgs >> objects: 8.54M objects, 1.9 TiB >> usage: 7.8 TiB used, 38 TiB / 46 TiB avail >> pgs: 491 active+clean >> 29 active+clean+snaptrim >> 9 active+clean+snaptrim_wait >> >> All MDSs, MONs and OSDs are in version 16.2.9. > > What are the outputs of `ceph fs status` and `ceph fs dump`? > > -Ramana > >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx Links: ------ [1] tel:0173503380 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx