MDS replay takes forever and cephfs is down

Flemming Frandsen <dren.dk@xxxxxxxxx> · Wed, 21 Apr 2021 16:38:08 +0200

I tried restarting an MDS server using: systemctl restart
ceph-mds@ardmore.service

This caused the standby server to enter replay state and the fs started
hanging for several minutes.

In a slight panic I restarted the other mds server, which was replaced by
the standby server and it almost immediately entered resolve state.

fs dump shows a seq number counting upwards very slowly for the replay'ing
MDS server, I have no idea how far it needs to count:

# ceph fs dump

dumped fsmap epoch 1030314
e1030314
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in sep
arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch   1030314
flags   12
created 2019-09-09 13:08:26.830927
modified        2021-04-21 14:04:14.672440
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  13610
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in sep
arate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
anchor table,9=file layout v2,10=snaprealm v2}
max_mds 2
in      0,1
up      {0=10398946,1=10404857}
failed
damaged
stopped
data_pools      [1]
metadata_pool   2
inline_data     disabled
balancer
standby_count_wanted    1
[mds.dalmore{0:10398946} state up:replay seq 215 addr [v2:
10.0.37.222:6800/2681188441,v1:10.0.37.222:6801/2681188441]]
[mds.cragganmore{1:10404857} state up:resolve seq 201 addr [v2:
10.0.37.221:6800/871249119,v1:10.0.37.221:6801/871249119]]

Standby daemons:

[mds.ardmore{-1:10408652} state up:standby seq 2 addr [v2:
10.0.37.223:6800/4096598841,v1:10.0.37.223:6801/4096598841]]

Earlier today we added a new OSD host with 12 new OSDs and backfilling is
proceeding as expected:

 cluster:
   id:     e2007417-a346-4af7-8aa9-4ce8f0d73661
   health: HEALTH_WARN
           1 filesystem is degraded
           1 MDSs behind on trimming

 services:
   mon: 3 daemons, quorum cragganmore,dalmore,ardmore (age 5w)
   mgr: ardmore(active, since 2w), standbys: dalmore, cragganmore
   mds: cephfs:2/2 {0=dalmore=up:replay,1=cragganmore=up:resolve} 1
up:standby
   osd: 69 osds: 69 up (since 102m), 69 in (since 102m); 443 remapped pgs

   rgw: 9 daemons active (ardmore.rgw0, ardmore.rgw1, ardmore.rgw2,
cragganmore.rgw0, cragganmore.rgw1, cragganmore.rgw2, dalmore.rgw0,
dalmore.rgw1, dalmore.rgw2)

 task status:
   scrub status:
       mds.cragganmore: idle
       mds.dalmore: idle

 data:
   pools:   13 pools, 1440 pgs
   objects: 50.57M objects, 9.0 TiB
   usage:   34 TiB used, 37 TiB / 71 TiB avail
   pgs:     30195420/151707033 objects misplaced (19.904%)
            997 active+clean
            431 active+remapped+backfill_wait
            12  active+remapped+backfilling

 io:
   client:   65 MiB/s rd, 206 KiB/s wr, 17 op/s rd, 8 op/s wr
   recovery: 5.5 MiB/s, 23 objects/s

 progress:
   Rebalancing after osd.62 marked in
     [======================........]
   Rebalancing after osd.67 marked in
     [===========...................]
   Rebalancing after osd.68 marked in
     [============..................]
   Rebalancing after osd.64 marked in
     [=====================.........]
   Rebalancing after osd.60 marked in
     [====================..........]
   Rebalancing after osd.66 marked in
     [=============.................]
   Rebalancing after osd.63 marked in
     [=====================.........]
   Rebalancing after osd.61 marked in
     [======================........]
   Rebalancing after osd.59 marked in
     [======================........]
   Rebalancing after osd.58 marked in
     [========================......]
   Rebalancing after osd.57 marked in
     [===========================...]
   Rebalancing after osd.65 marked in
     [==================............]

It seems we're running a mix of versions:

ceph versions
{
   "mon": {
       "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 3
   },
   "mgr": {
       "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
nautilus (stable)": 3
   },
   "osd": {
       "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 57,
       "ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0)
nautilus (stable)": 12
   },
   "mds": {
       "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
nautilus (stable)": 3
   },
   "rgw": {
       "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 9
   },
   "overall": {
       "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9)
nautilus (stable)": 69,
       "ceph version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11)
nautilus (stable)": 6,
       "ceph version 14.2.20 (36274af6eb7f2a5055f2d53ad448f2694e9046a0)
nautilus (stable)": 12
   }
}

Any hints will be greatly appreciated.

-- 
Flemming Frandsen - YAPH - http://osaa.dk - http://dren.dk/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx