Re: Health check failed: 1 pools ful

Frank Schilder <frans@xxxxxx> · Wed, 15 Sep 2021 07:08:56 +0000

It happened again today:

2021-09-15 04:25:20.551098 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-15 04:19:01.512425 [INF]  Health check cleared: POOL_FULL (was: 1 pools full)
2021-09-15 04:19:01.512389 [WRN]  Health check failed: 1 pools nearfull (POOL_NEAR_FULL)
2021-09-15 04:18:05.015251 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-15 04:18:05.015217 [ERR]  Health check failed: 1 pools full (POOL_FULL)
2021-09-15 04:13:45.312115 [WRN]  Health check failed: 1 pools nearfull (POOL_NEAR_FULL) 

During this time, we are running snapshot rotation on RBD images. Could this have anything to do with it?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 13 September 2021 12:20
To: ceph-users
Subject:  Health check failed: 1 pools ful

Hi all,

I recently had a strange blip in the ceph logs:

2021-09-09 04:19:09.612111 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-09 04:13:18.187602 [INF]  Health check cleared: POOL_FULL (was: 1 pools full)
2021-09-09 04:13:18.187566 [WRN]  Health check failed: 1 pools nearfull (POOL_NEAR_FULL)
2021-09-09 04:12:09.078878 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-09 04:12:09.078850 [ERR]  Health check failed: 1 pools full (POOL_FULL)
2021-09-09 04:08:16.898112 [WRN]  Health check failed: 1 pools nearfull (POOL_NEAR_FULL)

None of our pools are anywhere near full or close to their quotas:

# ceph df detail
GLOBAL:
    SIZE       AVAIL       RAW USED     %RAW USED     OBJECTS
    11 PiB     9.6 PiB      1.8 PiB         16.11     845.1 M
POOLS:
    NAME                     ID     QUOTA OBJECTS     QUOTA BYTES     USED        %USED     MAX AVAIL     OBJECTS       DIRTY       READ        WRITE       RAW USED
    sr-rbd-meta-one          1      N/A               500 GiB          90 GiB      0.21        41 TiB         31558     31.56 k     799 MiB     338 MiB      270 GiB
    sr-rbd-data-one          2      N/A               70 TiB           36 TiB     27.96        93 TiB      13966792     13.97 M     4.2 GiB     2.5 GiB       48 TiB
    sr-rbd-one-stretch       3      N/A               1 TiB           222 GiB      0.52        41 TiB         68813     68.81 k     863 MiB     860 MiB      667 GiB
    con-rbd-meta-hpc-one     7      N/A               10 GiB           51 KiB         0       1.7 TiB            61         61      7.0 MiB     3.8 MiB      154 KiB
    con-rbd-data-hpc-one     8      N/A               5 TiB            35 GiB         0       5.9 PiB          9245      9.24 k     144 MiB      78 MiB       44 GiB
    sr-rbd-data-one-hdd      11     N/A               200 TiB         118 TiB     39.90       177 TiB      31460630     31.46 M      14 GiB     2.2 GiB      157 TiB
    con-fs2-meta1            12     N/A               250 GiB         2.0 GiB      0.15       1.3 TiB      18045470     18.05 M      20 MiB     108 MiB      7.9 GiB
    con-fs2-meta2            13     N/A               100 GiB             0 B         0       1.3 TiB     216425275     216.4 M     141 KiB     7.9 MiB          0 B
    con-fs2-data             14     N/A               2.0 PiB         1.3 PiB     18.41       5.9 PiB     541502957     541.5 M     4.9 GiB     5.0 GiB      1.7 PiB
    con-fs2-data-ec-ssd      17     N/A               1 TiB           239 GiB      5.29       4.2 TiB       3225690      3.23 M      17 MiB         0 B      299 GiB
    ms-rbd-one               18     N/A               1 TiB           262 GiB      0.62        41 TiB         73711     73.71 k     4.8 MiB     1.5 GiB      786 GiB
    con-fs2-data2            19     N/A               5 PiB            29 TiB      0.52       5.4 PiB      20322725     20.32 M      83 MiB      97 MiB       39 TiB

I'm not sure if IO stopped, it does not look like it. The blip might have been artificial. I could not find any information about which pool(s) was causing this.

We are running ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).

Any ideas what is going on or if this could be a problem?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx