Hi,
sorry for not responding, our mail server was affected, too, I got
your response after we got our CephFS back online.
Do you have the mds log from the initial crash?
I would need to take a closer look but we're currently dealing with
the affected clients to get everything back in order.
Also, I don't see the new global_id warnings in your status output --
did you change any settings from the defaults during this upgrade?
I definitely did deactivate that warning, it could have indeed been
during the upgrade. Could that have caused the MDS damage? We have
older clients, it may take some time to update so I decided to silence
that warning. Was that a mistake? Maybe I just missed that information
but I didn't find any warnings for the update case. Do you have more
information?
Thanks!
Eugen
Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:
Hi,
Do you have the mds log from the initial crash?
Also, I don't see the new global_id warnings in your status output --
did you change any settings from the defaults during this upgrade?
Cheers, Dan
On Tue, May 18, 2021 at 10:22 AM Eugen Block <eblock@xxxxxx> wrote:
Hi *,
I tried a minor update (14.2.9 --> 14.2.20) on our ceph cluster today
and got into a damaged CephFS. It's rather urgent since noone can
really work right now, so any quick help is highly appreciated.
As for the update process I followed the usual update procedure, when
all MONs were finished I started to restart the OSDs, but suddenly our
cephfs got unresponsive (and still is).
I believe these lines are the critical ones:
---snap---
-12> 2021-05-18 09:53:01.488 7f7e9ed82700 5 mds.beacon.mds01
received beacon reply up:replay seq 906 rtt 0
-11> 2021-05-18 09:53:01.624 7f7e9f583700 10 monclient:
get_auth_request con 0x5608a5171600 auth_method 0
-10> 2021-05-18 09:53:03.732 7f7e94d6e700 -1
mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
-9> 2021-05-18 09:53:03.732 7f7e94d6e700 0 mds.0.log _replay
journaler got error -22, aborting
-8> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 log_channel(cluster)
log [ERR] : Error loading MDS rank 0: (22) Invalid argument
-7> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01
set_want_state: up:replay -> down:damaged
-6> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client log_queue
is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
-5> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client will send
2021-05-18 09:53:03.735824 mds.mds01 (mds.0) 1 : cluster [ERR] Error
loading MDS rank 0: (22) Invalid argument
-4> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
_send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
-3> 2021-05-18 09:53:03.732 7f7e94d6e700 5 mds.beacon.mds01
Sending beacon down:damaged seq 907
-2> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
_send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
-1> 2021-05-18 09:53:03.908 7f7e9ed82700 5 mds.beacon.mds01
received beacon reply down:damaged seq 907 rtt 0.176001
0> 2021-05-18 09:53:03.908 7f7e94d6e700 1 mds.mds01 respawn!
---snap---
These logs are from the attempt to bring the mds rank back up with
ceph mds repaired 0
I attached a longer excerpt of the log files if it helps. Before
trying anything from the disaster recovery steps I'd like to ask for
your input since one can damage it even more. The current status is
below, please let me know if more information is required.
Thanks!
Eugen
ceph01:~ # ceph -s
cluster:
id: 655cb05a-435a-41ba-83d9-8549f7c36167
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout flag(s) set
Some pool(s) have the nodeep-scrub flag(s) set
services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 116m)
mgr: ceph03(active, since 118m), standbys: ceph02, ceph01
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 32 osds: 32 up (since 64m), 32 in (since 8w)
flags noout
data:
pools: 14 pools, 512 pgs
objects: 5.08M objects, 8.6 TiB
usage: 27 TiB used, 33 TiB / 59 TiB avail
pgs: 512 active+clean
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx