Re: MDS stuck in replay

Magnus HAGDORN <Magnus.Hagdorn@xxxxxxxx> · Thu, 2 Jun 2022 08:38:54 +0000

at this stage we are not so worried about recovery since we moved to
our new pacific cluster. The problem arose during one of the nightly
syncs of the old cluster to the new cluster. However, we are quite keen
to use this as a learning opportunity to see what we can do to bring
this filesystem back to life.

On Wed, 2022-06-01 at 20:11 -0400, Ramana Venkatesh Raja wrote:
> Can you temporarily turn up the MDS debug log level (debug_mds) to
>
> check what's happening to this MDS during replay?
>
> ceph config set mds debug_mds 10
>
>

2022-06-02 09:32:36.814 7faca6d16700  5 mds.beacon.store06 Sending
beacon up:replay seq 195662
2022-06-02 09:32:36.814 7faca6d16700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] --
mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 --
0x5603d846d200 con 0x560185920c00
2022-06-02 09:32:36.814 7facab51f700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
<== mon.0 v2:192.168.34.179:3300/0 230794 ====
mdsbeacon(196066899/store06 up:replay seq 195662 v200622) v7 ====
132+0+0 (crc 0 0 0) 0x5603d846d200 con 0x560185920c00
2022-06-02 09:32:36.814 7facab51f700  5 mds.beacon.store06 received
beacon reply up:replay seq 195662 rtt 0
2022-06-02 09:32:37.090 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:37.090 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:38.091 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:38.091 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:38.320 7faca6515700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.124:6805/1445500,v1:192.168.34.124:6807/1445500] --
mgrreport(unknown.store06 +0-0 packed 1414) v8 -- 0x56018651ae00 con
0x5601869cb400
2022-06-02 09:32:39.092 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:39.092 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:40.094 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode
2022-06-02 09:32:40.094 7faca4d12700 10 mds.0.cache cache not ready for
trimming
2022-06-02 09:32:40.813 7faca6d16700  5 mds.beacon.store06 Sending
beacon up:replay seq 195663
2022-06-02 09:32:40.813 7faca6d16700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
--> [v2:192.168.34.179:3300/0,v1:192.168.34.179:6789/0] --
mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 --
0x5603d846d500 con 0x560185920c00
2022-06-02 09:32:40.813 7facab51f700  1 --
[v2:192.168.34.113:6800/3361270776,v1:192.168.34.113:6801/3361270776]
<== mon.0 v2:192.168.34.179:3300/0 230795 ====
mdsbeacon(196066899/store06 up:replay seq 195663 v200622) v7 ====
132+0+0 (crc 0 0 0) 0x5603d846d500 con 0x560185920c00
2022-06-02 09:32:40.813 7facab51f700  5 mds.beacon.store06 received
beacon reply up:replay seq 195663 rtt 0
2022-06-02 09:32:41.095 7faca4d12700  2 mds.0.cache Memory
usage:  total 22446592, rss 18448072, heap 332040, baseline 307464, 0 /
6982189 inodes have caps, 0 caps, 0 caps per inode

>
> Is the health of the MDS host okay? Is it low on memory?
>
>
plenty
[root@store06 ~]# free
              total        used        free      shared  buff/cache   a
vailable
Mem:      131939604    75007512     2646656        3380    54285436
52944852
Swap:      32930300        1800    32928500

>
> > The cluster is healthy.>
>
> Can you share the output of the `ceph status` , `ceph fs status`  and
>
> `ceph --version`?
[root@store06 ~]# ceph status
  cluster:
    id:     ebaa4a8f-5f17-4d57-b83b-a10f0226efaa
    health: HEALTH_WARN
            1 filesystem is degraded

  services:
    mon: 3 daemons, quorum store09,store08,store07 (age 10d)
    mgr: store08(active, since 15h), standbys: store09, store07
    mds: one:2/2 {0=store06=up:replay,1=store05=up:resolve} 3
up:standby
    osd: 116 osds: 116 up (since 10d), 116 in (since 4M)

  data:
    pools:   3 pools, 5121 pgs
    objects: 275.90M objects, 202 TiB
    usage:   625 TiB used, 182 TiB / 807 TiB avail
    pgs:     5115 active+clean
             6    active+clean+scrubbing+deep

[root@store06 ~]# ceph fs status
one - 741 clients
===
+------+---------+---------+----------+-------+-------+
| Rank |  State  |   MDS   | Activity |  dns  |  inos |
+------+---------+---------+----------+-------+-------+
|  0   |  replay | store06 |          | 7012k | 6982k |
|  1   | resolve | store05 |          | 82.9k | 78.4k |
+------+---------+---------+----------+-------+-------+
+------------------+----------+-------+-------+
|       Pool       |   type   |  used | avail |
+------------------+----------+-------+-------+
| weddell_metadata | metadata |  111G | 1963G |
|   weddell_data   |   data   |  622T | 44.0T |
+------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|   store09   |
|   store08   |
|   store07   |
+-------------+
MDS version: ceph version 14.2.22
(ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)

[root@store06 ~]# ceph --version
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351)
nautilus (stable)
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx