Hi, Do you have the mds log from the initial crash? Also, I don't see the new global_id warnings in your status output -- did you change any settings from the defaults during this upgrade? Cheers, Dan On Tue, May 18, 2021 at 10:22 AM Eugen Block <eblock@xxxxxx> wrote: > > Hi *, > > I tried a minor update (14.2.9 --> 14.2.20) on our ceph cluster today > and got into a damaged CephFS. It's rather urgent since noone can > really work right now, so any quick help is highly appreciated. > > As for the update process I followed the usual update procedure, when > all MONs were finished I started to restart the OSDs, but suddenly our > cephfs got unresponsive (and still is). > > I believe these lines are the critical ones: > > ---snap--- > -12> 2021-05-18 09:53:01.488 7f7e9ed82700 5 mds.beacon.mds01 > received beacon reply up:replay seq 906 rtt 0 > -11> 2021-05-18 09:53:01.624 7f7e9f583700 10 monclient: > get_auth_request con 0x5608a5171600 auth_method 0 > -10> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 > mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable > -9> 2021-05-18 09:53:03.732 7f7e94d6e700 0 mds.0.log _replay > journaler got error -22, aborting > -8> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 log_channel(cluster) > log [ERR] : Error loading MDS rank 0: (22) Invalid argument > -7> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01 > set_want_state: up:replay -> down:damaged > -6> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client log_queue > is 1 last_log 1 sent 0 num 1 unsent 1 sending 1 > -5> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client will send > 2021-05-18 09:53:03.735824 mds.mds01 (mds.0) 1 : cluster [ERR] Error > loading MDS rank 0: (22) Invalid argument > -4> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient: > _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0 > -3> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01 > Sending beacon down:damaged seq 907 > -2> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient: > _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0 > -1> 2021-05-18 09:53:03.908 7f7e9ed82700 5 mds.beacon.mds01 > received beacon reply down:damaged seq 907 rtt 0.176001 > 0> 2021-05-18 09:53:03.908 7f7e94d6e700 1 mds.mds01 respawn! > ---snap--- > > These logs are from the attempt to bring the mds rank back up with > > ceph mds repaired 0 > > I attached a longer excerpt of the log files if it helps. Before > trying anything from the disaster recovery steps I'd like to ask for > your input since one can damage it even more. The current status is > below, please let me know if more information is required. > > Thanks! > Eugen > > > ceph01:~ # ceph -s > cluster: > id: 655cb05a-435a-41ba-83d9-8549f7c36167 > health: HEALTH_ERR > 1 filesystem is degraded > 1 filesystem is offline > 1 mds daemon damaged > noout flag(s) set > Some pool(s) have the nodeep-scrub flag(s) set > > services: > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 116m) > mgr: ceph03(active, since 118m), standbys: ceph02, ceph01 > mds: cephfs:0/1 3 up:standby, 1 damaged > osd: 32 osds: 32 up (since 64m), 32 in (since 8w) > flags noout > > data: > pools: 14 pools, 512 pgs > objects: 5.08M objects, 8.6 TiB > usage: 27 TiB used, 33 TiB / 59 TiB avail > pgs: 512 active+clean > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx