Re: MDS in ReadOnly and 2 MDS behind on trimming

Eugen Block <eblock@xxxxxx> · Fri, 23 Feb 2024 11:50:00 +0000

Hi,

the mds log should contain information why it goes into read-only  
mode. Just a few weeks ago I helped a user with a broken CephFS (MDS  
went into read-only mode because of missing objects in the journal).  
Can you check the journal status:

# cephfs-journal-tool --rank=cephfs:0 --journal=mdlog journal inspect

# cephfs-journal-tool --rank=cephfs:0 --journal=purge_queue journal inspect

and also share the logs.

Thanks,
Eugen

Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:

Dear Ceph Community,

I am having an issue with my Ceph Cluster , there were several osd crashing
but now active and recovery finished and now the CephFS filesystem cannot be
access by clients in RW (K8S worklod) as the 1 MDS is in Read-Only and 2 are
being on trimming

The cephfs seems to have volume OK

The trimming process seems not going further, maybe stuck ?

We are running 3 hosts using ceph Pacific version 16.2.1

Here some logs on the situation :

ceph versions

{

    "mon": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3

    },

    "mgr": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3

    },

    "osd": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 18

    },

    "mds": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 3

    },

    "rgw": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 6

    },

    "overall": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 33

    }

}

ceph orch ps

NAME                          HOST       STATUS         REFRESHED  AGE
PORTS          VERSION  IMAGE ID      CONTAINER ID

crash.rke-sh1-1               rke-sh1-1  running (21h)  36s ago    21h  -
16.2.1   c757e4a3636b  e8652edb2b49

crash.rke-sh1-2               rke-sh1-2  running (21h)  3m ago     20M  -
16.2.1   c757e4a3636b  a1249a605ee0

crash.rke-sh1-3               rke-sh1-3  running (17h)  36s ago    17h  -
16.2.1   c757e4a3636b  026667bc1776

mds.cephfs.rke-sh1-1.ojmpnk   rke-sh1-1  running (18h)  36s ago    4M   -
16.2.1   c757e4a3636b  9b4c2b08b759

mds.cephfs.rke-sh1-2.isqjza   rke-sh1-2  running (18h)  3m ago     23M  -
16.2.1   c757e4a3636b  71681a5f34d3

mds.cephfs.rke-sh1-3.vdicdn   rke-sh1-3  running (17h)  36s ago    3M   -
16.2.1   c757e4a3636b  e89946ad6b7e

mgr.rke-sh1-1.qskoyj          rke-sh1-1  running (21h)  36s ago    2y
*:8082 *:9283  16.2.1   c757e4a3636b  7ce7cfbb3e55

mgr.rke-sh1-2.lxmguj          rke-sh1-2  running (21h)  3m ago     22M
*:8082 *:9283  16.2.1   c757e4a3636b  5a0025adfd46

mgr.rke-sh1-3.ckunvo          rke-sh1-3  running (17h)  36s ago    6M
*:8082 *:9283  16.2.1   c757e4a3636b  2fcaf18f3218

mon.rke-sh1-1                 rke-sh1-1  running (20h)  36s ago    20h  -
16.2.1   c757e4a3636b  c0a90103cabc

mon.rke-sh1-2                 rke-sh1-2  running (21h)  3m ago     3M   -
16.2.1   c757e4a3636b  f4b32ba4466b

mon.rke-sh1-3                 rke-sh1-3  running (17h)  36s ago    17h  -
16.2.1   c757e4a3636b  d5e44c245998

osd.0                         rke-sh1-2  running (20h)  3m ago     2y   -
16.2.1   c757e4a3636b  7b0e69942c15

osd.1                         rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  4451654d9a2d

osd.10                        rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  3f9d5f95e284

osd.11                        rke-sh1-1  running (21h)  36s ago    2y   -
16.2.1   c757e4a3636b  db1cc6d2e37f

osd.12                        rke-sh1-2  running (21h)  3m ago     2y   -
16.2.1   c757e4a3636b  de416c1ef766

osd.13                        rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  25a281cc5a9b

osd.14                        rke-sh1-1  running (21h)  36s ago    2y   -
16.2.1   c757e4a3636b  62f25ba61667

osd.15                        rke-sh1-2  running (21h)  3m ago     2y   -
16.2.1   c757e4a3636b  d3514d823c45

osd.16                        rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  bba857759bfe

osd.17                        rke-sh1-1  running (21h)  36s ago    2y   -
16.2.1   c757e4a3636b  59281d4bb3d0

osd.2                         rke-sh1-1  running (21h)  36s ago    2y   -
16.2.1   c757e4a3636b  418041b5e60d

osd.3                         rke-sh1-2  running (21h)  3m ago     2y   -
16.2.1   c757e4a3636b  04a0e29d5623

osd.4                         rke-sh1-1  running (20h)  36s ago    2y   -
16.2.1   c757e4a3636b  1cc78a5153d3

osd.5                         rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  39a4b11e31fb

osd.6                         rke-sh1-2  running (21h)  3m ago     2y   -
16.2.1   c757e4a3636b  2f218ffb566e

osd.7                         rke-sh1-1  running (20h)  36s ago    2y   -
16.2.1   c757e4a3636b  cf761fbe4d5f

osd.8                         rke-sh1-3  running (17h)  36s ago    2y   -
16.2.1   c757e4a3636b  f9f85480e800

osd.9                         rke-sh1-2  running (21h)  3m ago     2y   -
16.2.1   c757e4a3636b  664c54ff46d2

rgw.default.rke-sh1-1.dgucwl  rke-sh1-1  running (21h)  36s ago    22M
*:8000         16.2.1   c757e4a3636b  f03212b955a7

rgw.default.rke-sh1-1.vylchc  rke-sh1-1  running (21h)  36s ago    22M
*:8001         16.2.1   c757e4a3636b  da486ce43fe5

rgw.default.rke-sh1-2.dfhhfw  rke-sh1-2  running (21h)  3m ago     2y
*:8000         16.2.1   c757e4a3636b  ef4089d0aef2

rgw.default.rke-sh1-2.efkbum  rke-sh1-2  running (21h)  3m ago     2y
*:8001         16.2.1   c757e4a3636b  9e053d5a2f7b

rgw.default.rke-sh1-3.krfgey  rke-sh1-3  running (17h)  36s ago    9M
*:8001         16.2.1   c757e4a3636b  45cd3d75edd3

rgw.default.rke-sh1-3.pwdbmp  rke-sh1-3  running (17h)  36s ago    9M
*:8000         16.2.1   c757e4a3636b  e2710265a7f4

ceph health detail

HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming

[WRN] MDS_READ_ONLY: 1 MDSs are read only

    mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode

[WRN] MDS_TRIM: 2 MDSs behind on trimming

    mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149

    mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128)
max_segments: 128, num_segments: 2149

root@rke-sh1-1:~# ceph fs status

cephfs - 27 clients

======

RANK      STATE                 MDS               ACTIVITY     DNS    INOS
DIRS   CAPS

0        active      cephfs.rke-sh1-2.isqjza  Reqs:    8 /s  85.2k  53.2k
1742    101

0-s   standby-replay  cephfs.rke-sh1-1.ojmpnk  Evts:    0 /s  52.2k  20.2k
1737      0

      POOL         TYPE     USED  AVAIL

cephfs_metadata  metadata  1109G  6082G

  cephfs_data      data    8419G  6082G

      STANDBY MDS

cephfs.rke-sh1-3.vdicdn

MDS version: ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)

ceph status

  cluster:

    id:     fcb373ce-7aaa-11eb-984f-e7c6e0038e87

    health: HEALTH_WARN

            1 MDSs are read only

            2 MDSs behind on trimming

  services:

    mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h)

    mgr: rke-sh1-1.qskoyj(active, since 17h), standbys: rke-sh1-2.lxmguj,
rke-sh1-3.ckunvo

    mds: 1/1 daemons up, 1 standby, 1 hot standby

    osd: 18 osds: 18 up (since 17h), 18 in (since 20h)

    rgw: 6 daemons active (3 hosts, 1 zones)

  data:

    volumes: 1/1 healthy

    pools:   11 pools, 849 pgs

    objects: 10.10M objects, 5.3 TiB

    usage:   11 TiB used, 15 TiB / 26 TiB avail

    pgs:     849 active+clean

  io:

    client:   35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr

# ceph mds stat

cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1
up:standby

Have you got an idea on what could be my next steps to bring the cluster
healthy ?

Help will very be appreciated.

Thank a lot for your feedback.

Best Regards,

Edouard FAZENDA

Technical Support

Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40

 <https://www.csti.ch/> www.csti.ch

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx