It happened again today: 2021-09-15 04:25:20.551098 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-15 04:19:01.512425 [INF] Health check cleared: POOL_FULL (was: 1 pools full) 2021-09-15 04:19:01.512389 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) 2021-09-15 04:18:05.015251 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-15 04:18:05.015217 [ERR] Health check failed: 1 pools full (POOL_FULL) 2021-09-15 04:13:45.312115 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) During this time, we are running snapshot rotation on RBD images. Could this have anything to do with it? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 13 September 2021 12:20 To: ceph-users Subject: Health check failed: 1 pools ful Hi all, I recently had a strange blip in the ceph logs: 2021-09-09 04:19:09.612111 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-09 04:13:18.187602 [INF] Health check cleared: POOL_FULL (was: 1 pools full) 2021-09-09 04:13:18.187566 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) 2021-09-09 04:12:09.078878 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-09 04:12:09.078850 [ERR] Health check failed: 1 pools full (POOL_FULL) 2021-09-09 04:08:16.898112 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) None of our pools are anywhere near full or close to their quotas: # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 11 PiB 9.6 PiB 1.8 PiB 16.11 845.1 M POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED sr-rbd-meta-one 1 N/A 500 GiB 90 GiB 0.21 41 TiB 31558 31.56 k 799 MiB 338 MiB 270 GiB sr-rbd-data-one 2 N/A 70 TiB 36 TiB 27.96 93 TiB 13966792 13.97 M 4.2 GiB 2.5 GiB 48 TiB sr-rbd-one-stretch 3 N/A 1 TiB 222 GiB 0.52 41 TiB 68813 68.81 k 863 MiB 860 MiB 667 GiB con-rbd-meta-hpc-one 7 N/A 10 GiB 51 KiB 0 1.7 TiB 61 61 7.0 MiB 3.8 MiB 154 KiB con-rbd-data-hpc-one 8 N/A 5 TiB 35 GiB 0 5.9 PiB 9245 9.24 k 144 MiB 78 MiB 44 GiB sr-rbd-data-one-hdd 11 N/A 200 TiB 118 TiB 39.90 177 TiB 31460630 31.46 M 14 GiB 2.2 GiB 157 TiB con-fs2-meta1 12 N/A 250 GiB 2.0 GiB 0.15 1.3 TiB 18045470 18.05 M 20 MiB 108 MiB 7.9 GiB con-fs2-meta2 13 N/A 100 GiB 0 B 0 1.3 TiB 216425275 216.4 M 141 KiB 7.9 MiB 0 B con-fs2-data 14 N/A 2.0 PiB 1.3 PiB 18.41 5.9 PiB 541502957 541.5 M 4.9 GiB 5.0 GiB 1.7 PiB con-fs2-data-ec-ssd 17 N/A 1 TiB 239 GiB 5.29 4.2 TiB 3225690 3.23 M 17 MiB 0 B 299 GiB ms-rbd-one 18 N/A 1 TiB 262 GiB 0.62 41 TiB 73711 73.71 k 4.8 MiB 1.5 GiB 786 GiB con-fs2-data2 19 N/A 5 PiB 29 TiB 0.52 5.4 PiB 20322725 20.32 M 83 MiB 97 MiB 39 TiB I'm not sure if IO stopped, it does not look like it. The blip might have been artificial. I could not find any information about which pool(s) was causing this. We are running ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable). Any ideas what is going on or if this could be a problem? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx