Re: MDS rank 0 damaged after update to 14.2.20

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I looked a little closer into what happened yesterday during the update, I'll summarize for documentation purposes, maybe it helps other users in this situation (sorry for the long post):

The update process started at around 8 am.
The MONs/MGRs updated successfully, I restarted the OSD daemons on the first host (colocated with MONs) at 8:17 am, the second host at 8:19, the third at 8:20, all daemons came back up. But I didn't get to update the rest of the OSD servers (EC pool) or MDS because my user session became unresponsive (home directories is mounted via CephFS) shortly after I restarted the osd.target on the third host. The unresponsive mds session could be because of a couple of slow requests which eventually resolved, but my session didn't come back. Here's a short excerpt from the logs:

2021-05-18 08:35:51.498 7fcf3f399700 0 log_channel(cluster) log [WRN] : 5 slow requests, 0 included below; oldest blocked for > 975.015743 secs 2021-05-18 08:35:21.494 7fcf3f399700 0 log_channel(cluster) log [WRN] : slow request 124.603846 seconds old, received at 2021-05-18 08:33:16.896561: client_request(client.517094781:5126586689 getattr pAsLsXsFs #0x10000144138 2021-05-18 08:33:16.892492 caller_uid=<user_id>, caller_gid=100{}) currently failed to rdlock, waiting

Then the MDS shut down without an obvious reason, at least I couldn't find any hint for the root cause, these are the last log lines before the respawn:

---snip---
2021-05-18 08:35:51.498 7fcf3f399700 0 log_channel(cluster) log [WRN] : 5 slow requests, 0 included below; oldest blocked for > 975.015743 secs 2021-05-18 08:35:52.790 7fcf4139d700 1 mds.mds01 Updating MDS map to version 27067 from mon.1 2021-05-18 08:35:54.158 7fcf4139d700 1 mds.mds01 Updating MDS map to version 27068 from mon.1 2021-05-18 08:35:55.934 7fcf41e26700 -1 received signal: Terminated from /usr/lib/systemd/systemd --switched-root --system --deserialize 24 (PID: 1) UID: 0 2021-05-18 08:35:55.934 7fcf41e26700 -1 mds.mds01 *** got signal Terminated *** 2021-05-18 08:35:55.934 7fcf41e26700 1 mds.mds01 suicide! Wanted state up:active 2021-05-18 08:35:56.534 7fcf41e26700 1 mds.0.24535 shutdown: shutting down rank 0 2021-05-18 08:35:56.534 7fcf4139d700 0 ms_deliver_dispatch: unhandled message 0x5600e9d4c100 client_request(client.945151521:16960146 setfilelock rule 1, type 4, owner 11936550059977471360, pid 31266, start 0, length 0, wait 0 #0x100000697f5 2021-05-18 08:35:56.101588 caller_uid=<user_id>, caller_gid=100{}) v2 from client.945151521 v1:<IP>:0/4172979547 2021-05-18 08:35:56.534 7fcf4139d700 0 ms_deliver_dispatch: unhandled message 0x5602a0be3c00 osd_map(296392..296392 src has 295891..296392) v4 from mon.1 v1:<IP>:6789/0
2021-05-18 08:35:57.778 7f130d3ab540  0 set uid:gid to 167:167 (ceph:ceph)
2021-05-18 08:35:57.778 7f130d3ab540 0 ceph version 14.2.9-899-gc02349c600 (c02349c60052aaa6c7bd0c2270c7f7be16fab632) nautilus (stable), process ceph-mds, pid 9937
2021-05-18 08:35:57.778 7f130d3ab540  0 pidfile_write: ignore empty --pid-file
2021-05-18 08:35:57.822 7f12fb6fa700 1 mds.mds01 Updating MDS map to version 27070 from mon.1 2021-05-18 08:35:59.214 7f12fb6fa700 1 mds.mds01 Updating MDS map to version 27071 from mon.1 2021-05-18 08:36:03.130 7f12fb6fa700 1 mds.mds01 Updating MDS map to version 27072 from mon.1 2021-05-18 08:36:03.130 7f12fb6fa700 1 mds.mds01 Monitors have assigned me to become a standby.
---snip---

These are the log lines from one of the MONs:

---snip---
2021-05-18 08:35:56.130088 mgr.ceph03 (mgr.969654526) 648 : cluster [DBG] pgmap v720: 512 pgs: 512 active+clean; 8.6 TiB data, 27 TiB used, 33 TiB / 59 TiB avail; 21 MiB/s rd, 55 MiB/s wr, 680 op/s 2021-05-18 08:35:56.415375 mon.ceph01 (mon.0) 2260 : cluster [WRN] Health check failed: 1 filesystem is degraded (FS_DEGRADED) 2021-05-18 08:35:56.415416 mon.ceph01 (mon.0) 2261 : cluster [ERR] Health check failed: 1 filesystem is offline (MDS_ALL_DOWN) 2021-05-18 08:35:56.415435 mon.ceph01 (mon.0) 2262 : cluster [WRN] Health check update: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO) 2021-05-18 08:35:56.415450 mon.ceph01 (mon.0) 2263 : cluster [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests) 2021-05-18 08:35:56.550700 mon.ceph01 (mon.0) 2264 : cluster [DBG] mds.0 v1:<IP>:6800/3681851469 down:dne 2021-05-18 08:35:56.550751 mon.ceph01 (mon.0) 2265 : cluster [INF] Standby daemon mds.mds02 assigned to filesystem cephfs as rank 0 2021-05-18 08:35:56.550880 mon.ceph01 (mon.0) 2266 : cluster [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is offline) 2021-05-18 08:35:56.581947 mon.ceph01 (mon.0) 2267 : cluster [DBG] fsmap cephfs:0/1 1 up:standby-replay 1 up:standby, 1 failed 2021-05-18 08:35:56.662170 mon.ceph01 (mon.0) 2268 : cluster [DBG] fsmap cephfs:1/1 {0=mds02=up:replay} 1 up:standby 2021-05-18 08:35:59.195292 mon.ceph01 (mon.0) 2269 : cluster [DBG] mds.0 v1:<IP>:6800/1315589958 up:replay 2021-05-18 08:35:59.195335 mon.ceph01 (mon.0) 2270 : cluster [DBG] fsmap cephfs:1/1 {0=mds02=up:replay} 1 up:standby 2021-05-18 08:36:03.105907 mon.ceph01 (mon.0) 2273 : cluster [DBG] mds.0 v1:<IP>:6800/1315589958 up:replay 2021-05-18 08:36:03.105941 mon.ceph01 (mon.0) 2274 : cluster [DBG] mds.? v1:<IP>:6800/357038325 up:boot 2021-05-18 08:36:03.105990 mon.ceph01 (mon.0) 2275 : cluster [DBG] fsmap cephfs:1/1 {0=mds02=up:replay} 2 up:standby 2021-05-18 08:36:06.161627 mon.ceph01 (mon.0) 2278 : cluster [DBG] mds.0 v1:<IP>:6800/1315589958 up:replay 2021-05-18 08:36:06.161668 mon.ceph01 (mon.0) 2279 : cluster [DBG] fsmap cephfs:1/1 {0=mds02=up:replay} 2 up:standby
---snip---

But the replay never finished. It took some time to get me into a couple of SSH sessions without my home dir, then at 08:47 I tried to provoke a failover with

ceph mds fail 0
ceph mds repaired 0

without success:

---snip---
2021-05-18 08:48:33.678 7f12fb6fa700 1 mds.mds01 Updating MDS map to version 27218 from mon.1 2021-05-18 08:48:33.678 7f12fb6fa700 1 mds.0.27218 handle_mds_map i am now mds.0.27218 2021-05-18 08:48:33.678 7f12fb6fa700 1 mds.0.27218 handle_mds_map state change up:boot --> up:replay
2021-05-18 08:48:33.678 7f12fb6fa700  1 mds.0.27218 replay_start
2021-05-18 08:48:33.678 7f12fb6fa700  1 mds.0.27218  recovery set is
2021-05-18 08:48:33.678 7f12fb6fa700 1 mds.0.27218 waiting for osdmap 296394 (which blacklists prior instance) 2021-05-18 08:48:33.686 7f12f46ec700 0 mds.0.cache creating system inode with ino:0x100 2021-05-18 08:48:33.686 7f12f46ec700 0 mds.0.cache creating system inode with ino:0x1 2021-05-18 08:48:35.770 7f12f36ea700 -1 mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable 2021-05-18 08:48:35.770 7f12f36ea700 0 mds.0.log _replay journaler got error -22, aborting 2021-05-18 08:48:35.774 7f12f36ea700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
2021-05-18 08:48:35.858 7f12f36ea700  1 mds.mds01 respawn!
---snip---

Apparently the journal was corrupted according to this line:

2021-05-18 08:48:35.770 7f12f36ea700 -1 mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable

I tried a couple of harmless things like trying to set max_mds to 2, restarting daemons, tried to finish the update on the rest of the servers including the MDS daemons, but nothing really helped.
At 08:57 I silenced the global_id warnings:

ceph config set mon mon_warn_on_insecure_global_id_reclaim false
ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false

But I don't believe it had any impact on the MDS failure.
What I did in the end was to dump the journal as backup and executed these commands:

---snip---
mds01:~ # systemctl stop ceph-mds@mds01.service

mds01:~ # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
recover_dentries summary
Events by type:
  OPEN: 9499
  SESSION: 2
  SUBTREEMAP: 161
  UPDATE: 116903
Errors: 0

mds01:~ # cephfs-journal-tool --rank=cephfs:0 journal reset

mds01:~ # cephfs-table-tool all reset session

mds01:~ # systemctl start ceph-mds@mds01.service

mds01:~ # ceph mds repaired 0

mds01:~ # ceph daemon mds.mds01 scrub_path / recursive repair
---snip---

This brought the MDS back up, the standby daemons also started successfully. The scrub didn't show any errors, but shortly after that the MDS reported damaged metadata, so I analysed the damage with

ceph tell mds.mds01 damage ls

Fortunately, those were only a couple of temporary directories for some compile jobs, the developers cleaned them up but the error didn't go away. In a different mailing list thread a user reported that he had to restart the active MDS daemon which I also did, and then finally the cluster got back into HEALTH_OK state.

I'm not sure what the conclusion is, but to me it appears to be just a conincidence that the MDS failed during an update. But if you interpret it differently I'd be interested to learn more.

Best regards,
Eugen


Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

On Tue, May 18, 2021 at 4:00 PM Eugen Block <eblock@xxxxxx> wrote:

Hi,

sorry for not responding, our mail server was affected, too, I got
your response after we got our CephFS back online.

Glad to hear it's back online!

> Do you have the mds log from the initial crash?

I would need to take a closer look but we're currently dealing with
the affected clients to get everything back in order.

No rush, but this would be useful to analyze what broke the fs initially.

> Also, I don't see the new global_id warnings in your status output --
> did you change any settings from the defaults during this upgrade?

I definitely did deactivate that warning, it could have indeed been
during the upgrade. Could that have caused the MDS damage? We have
older clients, it may take some time to update so I decided to silence
that warning. Was that a mistake? Maybe I just missed that information
but I didn't find any warnings for the update case. Do you have more
information?

If you set these, it should be safe:

  mon       advanced mon_warn_on_insecure_global_id_reclaim         false
  mon       advanced mon_warn_on_insecure_global_id_reclaim_allowed false

but if you changed these other settings to false, it might have caused the
old mds to error:

auth_allow_insecure_global_id_reclaim
auth_expose_insecure_global_id_reclaim

Cheers, Dan




Thanks!
Eugen


Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

> Hi,
>
> Do you have the mds log from the initial crash?
>
> Also, I don't see the new global_id warnings in your status output --
> did you change any settings from the defaults during this upgrade?
>
> Cheers, Dan
>
> On Tue, May 18, 2021 at 10:22 AM Eugen Block <eblock@xxxxxx> wrote:
>>
>> Hi *,
>>
>> I tried a minor update (14.2.9 --> 14.2.20) on our ceph cluster today
>> and got into a damaged CephFS. It's rather urgent since noone can
>> really work right now, so any quick help is highly appreciated.
>>
>> As for the update process I followed the usual update procedure, when
>> all MONs were finished I started to restart the OSDs, but suddenly our
>> cephfs got unresponsive (and still is).
>>
>> I believe these lines are the critical ones:
>>
>> ---snap---
>>     -12> 2021-05-18 09:53:01.488 7f7e9ed82700  5 mds.beacon.mds01
>> received beacon reply up:replay seq 906 rtt 0
>>     -11> 2021-05-18 09:53:01.624 7f7e9f583700 10 monclient:
>> get_auth_request con 0x5608a5171600 auth_method 0
>>     -10> 2021-05-18 09:53:03.732 7f7e94d6e700 -1
>> mds.0.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
>>      -9> 2021-05-18 09:53:03.732 7f7e94d6e700  0 mds.0.log _replay
>> journaler got error -22, aborting
>>      -8> 2021-05-18 09:53:03.732 7f7e94d6e700 -1 log_channel(cluster)
>> log [ERR] : Error loading MDS rank 0: (22) Invalid argument
>>      -7> 2021-05-18 09:53:03.732 7f7e94d6e700  5 mds.beacon.mds01
>> set_want_state: up:replay -> down:damaged
>>      -6> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client  log_queue
>> is 1 last_log 1 sent 0 num 1 unsent 1 sending 1
>>      -5> 2021-05-18 09:53:03.732 7f7e94d6e700 10 log_client  will send
>> 2021-05-18 09:53:03.735824 mds.mds01 (mds.0) 1 : cluster [ERR] Error
>> loading MDS rank 0: (22) Invalid argument
>>      -4> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
>> _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
>>      -3> 2021-05-18 09:53:03.732 7f7e94d6e700  5 mds.beacon.mds01
>> Sending beacon down:damaged seq 907
>>      -2> 2021-05-18 09:53:03.732 7f7e94d6e700 10 monclient:
>> _send_mon_message to mon.ceph01 at v2:XXX.XXX.XXX.XXX:3300/0
>>      -1> 2021-05-18 09:53:03.908 7f7e9ed82700  5 mds.beacon.mds01
>> received beacon reply down:damaged seq 907 rtt 0.176001
>>       0> 2021-05-18 09:53:03.908 7f7e94d6e700  1 mds.mds01 respawn!
>> ---snap---
>>
>> These logs are from the attempt to bring the mds rank back up with
>>
>> ceph mds repaired 0
>>
>> I attached a longer excerpt of the log files if it helps. Before
>> trying anything from the disaster recovery steps I'd like to ask for
>> your input since one can damage it even more. The current status is
>> below, please let me know if more information is required.
>>
>> Thanks!
>> Eugen
>>
>>
>> ceph01:~ # ceph -s
>>    cluster:
>>      id:     655cb05a-435a-41ba-83d9-8549f7c36167
>>      health: HEALTH_ERR
>>              1 filesystem is degraded
>>              1 filesystem is offline
>>              1 mds daemon damaged
>>              noout flag(s) set
>>              Some pool(s) have the nodeep-scrub flag(s) set
>>
>>    services:
>>      mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 116m)
>>      mgr: ceph03(active, since 118m), standbys: ceph02, ceph01
>>      mds: cephfs:0/1 3 up:standby, 1 damaged
>>      osd: 32 osds: 32 up (since 64m), 32 in (since 8w)
>>           flags noout
>>
>>    data:
>>      pools:   14 pools, 512 pgs
>>      objects: 5.08M objects, 8.6 TiB
>>      usage:   27 TiB used, 33 TiB / 59 TiB avail
>>      pgs:     512 active+clean
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux