MDS in ReadOnly and 2 MDS behind on trimming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph Community,

 

I am having an issue with my Ceph Cluster , there were several osd crashing but now active and recovery finished and now the CephFS filesystem cannot be access by clients in RW (K8S worklod) as the 1 MDS is in Read-Only and 2 are being on trimming

 

The cephfs seems to have volume OK

 

The trimming process seems not going further, maybe stuck ?

 

We are running 3 hosts using ceph Pacific version 16.2.1

 

Here some logs on the situation :

 

ceph versions

{

    "mon": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3

    },

    "mgr": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3

    },

    "osd": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 18

    },

    "mds": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 3

    },

    "rgw": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 6

    },

    "overall": {

        "ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)": 33

    }

}

 

ceph orch ps

NAME                          HOST       STATUS         REFRESHED  AGE  PORTS          VERSION  IMAGE ID      CONTAINER ID

crash.rke-sh1-1               rke-sh1-1  running (21h)  36s ago    21h  -              16.2.1   c757e4a3636b  e8652edb2b49

crash.rke-sh1-2               rke-sh1-2  running (21h)  3m ago     20M  -              16.2.1   c757e4a3636b  a1249a605ee0

crash.rke-sh1-3               rke-sh1-3  running (17h)  36s ago    17h  -              16.2.1   c757e4a3636b  026667bc1776

mds.cephfs.rke-sh1-1.ojmpnk   rke-sh1-1  running (18h)  36s ago    4M   -              16.2.1   c757e4a3636b  9b4c2b08b759

mds.cephfs.rke-sh1-2.isqjza   rke-sh1-2  running (18h)  3m ago     23M  -              16.2.1   c757e4a3636b  71681a5f34d3

mds.cephfs.rke-sh1-3.vdicdn   rke-sh1-3  running (17h)  36s ago    3M   -              16.2.1   c757e4a3636b  e89946ad6b7e

mgr.rke-sh1-1.qskoyj          rke-sh1-1  running (21h)  36s ago    2y   *:8082 *:9283  16.2.1   c757e4a3636b  7ce7cfbb3e55

mgr.rke-sh1-2.lxmguj          rke-sh1-2  running (21h)  3m ago     22M  *:8082 *:9283  16.2.1   c757e4a3636b  5a0025adfd46

mgr.rke-sh1-3.ckunvo          rke-sh1-3  running (17h)  36s ago    6M   *:8082 *:9283  16.2.1   c757e4a3636b  2fcaf18f3218

mon.rke-sh1-1                 rke-sh1-1  running (20h)  36s ago    20h  -              16.2.1   c757e4a3636b  c0a90103cabc

mon.rke-sh1-2                 rke-sh1-2  running (21h)  3m ago     3M   -              16.2.1   c757e4a3636b  f4b32ba4466b

mon.rke-sh1-3                 rke-sh1-3  running (17h)  36s ago    17h  -              16.2.1   c757e4a3636b  d5e44c245998

osd.0                         rke-sh1-2  running (20h)  3m ago     2y   -              16.2.1   c757e4a3636b  7b0e69942c15

osd.1                         rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  4451654d9a2d

osd.10                        rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  3f9d5f95e284

osd.11                        rke-sh1-1  running (21h)  36s ago    2y   -              16.2.1   c757e4a3636b  db1cc6d2e37f

osd.12                        rke-sh1-2  running (21h)  3m ago     2y   -              16.2.1   c757e4a3636b  de416c1ef766

osd.13                        rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  25a281cc5a9b

osd.14                        rke-sh1-1  running (21h)  36s ago    2y   -              16.2.1   c757e4a3636b  62f25ba61667

osd.15                        rke-sh1-2  running (21h)  3m ago     2y   -              16.2.1   c757e4a3636b  d3514d823c45

osd.16                        rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  bba857759bfe

osd.17                        rke-sh1-1  running (21h)  36s ago    2y   -              16.2.1   c757e4a3636b  59281d4bb3d0

osd.2                         rke-sh1-1  running (21h)  36s ago    2y   -              16.2.1   c757e4a3636b  418041b5e60d

osd.3                         rke-sh1-2  running (21h)  3m ago     2y   -              16.2.1   c757e4a3636b  04a0e29d5623

osd.4                         rke-sh1-1  running (20h)  36s ago    2y   -              16.2.1   c757e4a3636b  1cc78a5153d3

osd.5                         rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  39a4b11e31fb

osd.6                         rke-sh1-2  running (21h)  3m ago     2y   -              16.2.1   c757e4a3636b  2f218ffb566e

osd.7                         rke-sh1-1  running (20h)  36s ago    2y   -              16.2.1   c757e4a3636b  cf761fbe4d5f

osd.8                         rke-sh1-3  running (17h)  36s ago    2y   -              16.2.1   c757e4a3636b  f9f85480e800

osd.9                         rke-sh1-2  running (21h)  3m ago     2y   -              16.2.1   c757e4a3636b  664c54ff46d2

rgw.default.rke-sh1-1.dgucwl  rke-sh1-1  running (21h)  36s ago    22M  *:8000         16.2.1   c757e4a3636b  f03212b955a7

rgw.default.rke-sh1-1.vylchc  rke-sh1-1  running (21h)  36s ago    22M  *:8001         16.2.1   c757e4a3636b  da486ce43fe5

rgw.default.rke-sh1-2.dfhhfw  rke-sh1-2  running (21h)  3m ago     2y   *:8000         16.2.1   c757e4a3636b  ef4089d0aef2

rgw.default.rke-sh1-2.efkbum  rke-sh1-2  running (21h)  3m ago     2y   *:8001         16.2.1   c757e4a3636b  9e053d5a2f7b

rgw.default.rke-sh1-3.krfgey  rke-sh1-3  running (17h)  36s ago    9M   *:8001         16.2.1   c757e4a3636b  45cd3d75edd3

rgw.default.rke-sh1-3.pwdbmp  rke-sh1-3  running (17h)  36s ago    9M   *:8000         16.2.1   c757e4a3636b  e2710265a7f4

 

ceph health detail

HEALTH_WARN 1 MDSs are read only; 2 MDSs behind on trimming

[WRN] MDS_READ_ONLY: 1 MDSs are read only

    mds.cephfs.rke-sh1-2.isqjza(mds.0): MDS in read-only mode

[WRN] MDS_TRIM: 2 MDSs behind on trimming

    mds.cephfs.rke-sh1-2.isqjza(mds.0): Behind on trimming (2149/128) max_segments: 128, num_segments: 2149

    mds.cephfs.rke-sh1-1.ojmpnk(mds.0): Behind on trimming (2149/128) max_segments: 128, num_segments: 2149

 

root@rke-sh1-1:~# ceph fs status

cephfs - 27 clients

======

RANK      STATE                 MDS               ACTIVITY     DNS    INOS   DIRS   CAPS

0        active      cephfs.rke-sh1-2.isqjza  Reqs:    8 /s  85.2k  53.2k  1742    101

0-s   standby-replay  cephfs.rke-sh1-1.ojmpnk  Evts:    0 /s  52.2k  20.2k  1737      0

      POOL         TYPE     USED  AVAIL

cephfs_metadata  metadata  1109G  6082G

  cephfs_data      data    8419G  6082G

      STANDBY MDS

cephfs.rke-sh1-3.vdicdn

MDS version: ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)

 

ceph status

  cluster:

    id:     fcb373ce-7aaa-11eb-984f-e7c6e0038e87

    health: HEALTH_WARN

            1 MDSs are read only

            2 MDSs behind on trimming

 

  services:

    mon: 3 daemons, quorum rke-sh1-2,rke-sh1-1,rke-sh1-3 (age 17h)

    mgr: rke-sh1-1.qskoyj(active, since 17h), standbys: rke-sh1-2.lxmguj, rke-sh1-3.ckunvo

    mds: 1/1 daemons up, 1 standby, 1 hot standby

    osd: 18 osds: 18 up (since 17h), 18 in (since 20h)

    rgw: 6 daemons active (3 hosts, 1 zones)

 

  data:

    volumes: 1/1 healthy

    pools:   11 pools, 849 pgs

    objects: 10.10M objects, 5.3 TiB

    usage:   11 TiB used, 15 TiB / 26 TiB avail

    pgs:     849 active+clean

 

  io:

    client:   35 KiB/s rd, 1.0 MiB/s wr, 302 op/s rd, 165 op/s wr

 

 

# ceph mds stat

cephfs:1 {0=cephfs.rke-sh1-2.isqjza=up:active} 1 up:standby-replay 1 up:standby

 

Have you got an idea on what could be my next steps to bring the cluster healthy ?

 

Help will very be appreciated.

 

Thank a lot for your feedback.

 

Best Regards,

 

Edouard FAZENDA

Technical Support

 

 

Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40

 

www.csti.ch

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux