Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

"Matthew Leonard (BLOOMBERG/ 120 PARK)" <mleonard33@xxxxxxxxxxxxx> · Sat, 24 Feb 2024 09:23:31 -0000

It looks like you have quite a few problems I’ll try and address them one by one. 

1) Looks like you had a bunch of crashes, from the ceph -s it looks like you don’t have enough MDS daemons running for a quorum. So you’ll need to restart the crashed containers. 

2) It looks like you might have an interesting crush map. Allegedly you have 41TiB of space but you can’t finish rococering you have lots of PGs stuck as their destination is too full. Are you running homogenous hardware or do you have different drive sizes? Are all the weights set correctly?

One you correct item 1 you’ll need to correct item 2 to get back to a healthy spot. 

Sent from Bloomberg Professional for iPhone

----- Original Message -----
From: nguyenvandiep@xxxxxxxxxxxxxx
To: ceph-users@xxxxxxx
At: 02/24/24 09:01:22 UTC

Hi Mathew

Pls chekc my ceph -s

ceph -s
  cluster:
    id:     258af72a-cff3-11eb-a261-d4f5ef25154c
    health: HEALTH_WARN
            3 failed cephadm daemon(s)
            1 filesystem is degraded
            insufficient standby MDS daemons available
            1 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself):
21 pgs backfill_toofull
            15 pool(s) nearfull
            11 daemons have recently crashed

  services:
    mon:         6 daemons, quorum
cephgw03,cephosd01,cephgw01,cephosd03,cephgw02,cephosd02 (age 30h)
    mgr:         cephgw01.vwoffq(active, since 17h), standbys: cephgw02.nauphz,
cephgw03.aipvii
    mds:         1/1 daemons up
    osd:         29 osds: 29 up (since 40h), 29 in (since 29h); 402 remapped pgs
    rgw:         2 daemons active (2 hosts, 1 zones)
    tcmu-runner: 18 daemons active (2 hosts)

  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   15 pools, 1457 pgs
    objects: 36.87M objects, 25 TiB
    usage:   75 TiB used, 41 TiB / 116 TiB avail
    pgs:     17759672/110607480 objects misplaced (16.056%)
             1055 active+clean
             363  active+remapped+backfill_wait
             18   active+remapped+backfilling
             14   active+remapped+backfill_toofull
             7
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx