Re: mds stuck in standby, not one active

Mevludin Blazevic <mblazevic@xxxxxxxxxxxxxx> · Tue, 13 Dec 2022 20:21:20 +0100

Hi,

thanks for the quick response!

CEPH STATUS:

cluster:
    id:     8c774934-1535-11ec-973e-525400130e4f
    health: HEALTH_ERR
            7 failed cephadm daemon(s)
            There are daemons running an older version of ceph
            1 filesystem is degraded
            1 filesystem has a failed mds daemon
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
            23 daemons have recently crashed

  services:
    mon: 2 daemons, quorum cephadm-vm,store2 (age 12d)
    mgr: store1.uevcpd(active, since 34m), standbys: cephadm-vm.zwagng
    mds: 0/1 daemons up (1 failed), 4 standby
    osd: 324 osds: 318 up (since 3h), 318 in (since 2h)

  data:
    volumes: 0/1 healthy, 1 failed
    pools:   6 pools, 257 pgs
    objects: 2.61M objects, 9.8 TiB
    usage:   29 TiB used, 2.0 PiB / 2.0 PiB avail
    pgs:     257 active+clean

  io:
    client:   0 B/s rd, 2.8 MiB/s wr, 435 op/s rd, 496 op/s wr

FS DUMP:

e60
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no 
anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'ceph_fs' (1)
fs_name ceph_fs
epoch   58
flags   32
created 2022-11-28T12:05:17.203346+0000
modified        2022-12-13T19:03:46.707236+0000
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
required_client_features        {}
last_failure    0
last_failure_osd_epoch  196035
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no 
anchor table,9=file layout v2,10=snaprealm v2}
max_mds 2
in      0
up      {}
failed  0
damaged
stopped
data_pools      [4]
metadata_pool   5
inline_data     disabled
balancer
standby_count_wanted    1

Standby daemons:

[mds.ceph_fs.store5.gnlqqm{-1:152180029} state up:standby seq 1 
join_fscid=1 addr 
[v2:192.168.50.135:6800/3548272808,v1:192.168.50.135:6801/3548272808] 
compat {c=[1],r=[1],i=[1]}]
[mds.ceph_fs.store6.fxgvoj{ffffffff:915af89} state up:standby seq 1 
join_fscid=1 addr 
[v2:192.168.50.136:1b70/4fde2aa0,v1:192.168.50.136:1b71/4fde2aa0] compat 
{c=[1],r=[1],i=[1]}]
[mds.ceph_fs.store4.mhvpot{ffffffff:916a09d} state up:standby seq 1 
join_fscid=1 addr 
[v2:192.168.50.134:1a90/b8b1f33c,v1:192.168.50.134:1a91/b8b1f33c] compat 
{c=[1],r=[1],i=[1]}]
[mds.ceph_fs.store3.vcnwzh{ffffffff:916aff7} state up:standby seq 1 
join_fscid=1 addr 
[v2:192.168.50.133:1a90/49cb4e4,v1:192.168.50.133:1a91/49cb4e4] compat 
{c=[1],r=[1],i=[1]}]
dumped fsmap epoch 60

Am 13.12.2022 um 20:11 schrieb Patrick Donnelly:
On Tue, Dec 13, 2022 at 2:02 PM Mevludin Blazevic
<mblazevic@xxxxxxxxxxxxxx> wrote:
Hi all,

in Ceph Pacific 6.2.5, the MDS failover function does not working. The
one host with the active MDS hat to be rebooted and after that, the
standby deamons did not jump in. The fs was not accessible, instead all
mds remain until now to standby. Also the cluster remains in Ceph Error
due to inactive mds so I did the following:

ceph fs set cephfs false
ceph fs set cephfs max_mds 2

We also tried to restart the mds by the given yaml file, nothing works.

The Ceph FS pool is green and clean.
Please share:

ceph status
ceph fs dump

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx