Re: mds's stay in up:standby

Eugen Block <eblock@xxxxxx> · Mon, 12 Sep 2022 10:21:41 +0000

Hi,

what happenend to the cluster? Several services report a short uptime  
(68 minutes). If you shared some MDS logs someone might find a hint  
why they won't become active. If the regular logs don't reveal  
anything enable debug logs.

Zitat von Tobias Florek <ceph@xxxxxxxxxx>:

Hi!

I am running a rook managed hyperconverged ceph cluster on  
kubernetes using ceph 17.2.3 with a single-rank single fs cephfs.

I am now facing the problem that the mds's stay in up:standby.  I  
tried setting allow_standby_replay to false and restarting both mds  
daemons, but nothing changed.

ceph -s
  cluster:
    id:     08f51f08-9551-488f-9419-787a7717555e
    health: HEALTH_ERR
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged

  services:
    mon: 5 daemons, quorum cy,dt,du,dv,dw (age 68m)
    mgr: a(active, since 64m), standbys: b
    mds: 0/1 daemons up, 2 standby
    osd: 10 osds: 10 up (since 68m), 10 in (since 3d)

  data:
    volumes: 0/1 healthy, 1 recovering; 1 damaged
    pools:   14 pools, 273 pgs
    objects: 834.69k objects, 1.2 TiB
    usage:   3.7 TiB used, 23 TiB / 26 TiB avail
    pgs:     273 active+clean

The journal looks ok though:

cephfs-journal-tool --rank cephfs:0 journal inspect
Overall journal integrity: OK

cephfs-journal-tool --rank cephfs:0 header get
{
    "magic": "ceph fs volume v011",
    "write_pos": 2344234253408,
    "expire_pos": 2344068406026,
    "trimmed_pos": 2344041316352,
    "stream_format": 1,
    "layout": {
        "stripe_unit": 4194304,
        "stripe_count": 1,
        "object_size": 4194304,
        "pool_id": 10,
        "pool_ns": ""
    }
}

cephfs-journal-tool --rank cephfs:0 event get summary
Events by type:
  OPEN: 47779
  SESSION: 24
  SUBTREEMAP: 113
  UPDATE: 53346
Errors: 0

ceph fs dump
e269368
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base  
v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir  
inode in separate object,5=
mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor  
table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch   269356
flags   32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
created 2020-05-05T21:54:21.907356+0000
modified        2022-09-07T13:32:13.263940+0000
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
required_client_features        {}
last_failure    0
last_failure_osd_epoch  69305
compat  compat={},rocompat={},incompat={1=base v0.20,2=client  
writeable ranges,3=default file layouts on dirs,4=dir inode in  
separate object,5=mds uses
 versioned encoding,6=dirfrag is stored in omap,7=mds uses inline  
data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in      0
up      {}
failed
damaged 0
stopped
data_pools      [11,14]
metadata_pool   10
inline_data     disabled
balancer
standby_count_wanted    1

Standby daemons:

[mds.cephfs-a{-1:94490181} state up:standby seq 1 join_fscid=1 addr  
[v2:172.21.0.75:6800/3162134136,v1:172.21.0.75:6801/3162134136]  
compat {c=[1],r=[1]
,i=[7ff]}]
[mds.cephfs-b{-1:94519600} state up:standby seq 1 join_fscid=1 addr  
[v2:172.21.0.76:6800/2282837495,v1:172.21.0.76:6801/2282837495]  
compat {c=[1],r=[1]
,i=[7ff]}]
dumped fsmap epoch 269368

Thank you for your help!
 Tobias Florek
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx