Hi Frank,
I think the snapshot rotation could be an explanation.
Just a few days ago we had a host failure over night and some OSDs
couldn't be rebalanced entirely because they were too full. Deleting a
few (large) snapshots I created last week resolved the issue. If you
monitored 'ceph osd df' for a couple of days you should probably see
spikes in the OSD usage stats. The only difference I see is that we
also had 'OSD nearfull' warnings which you don't seem to have, so it
might be something else.
Zitat von Frank Schilder <frans@xxxxxx>:
It happened again today:
2021-09-15 04:25:20.551098 [INF] Health check cleared:
POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-15 04:19:01.512425 [INF] Health check cleared: POOL_FULL
(was: 1 pools full)
2021-09-15 04:19:01.512389 [WRN] Health check failed: 1 pools
nearfull (POOL_NEAR_FULL)
2021-09-15 04:18:05.015251 [INF] Health check cleared:
POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-15 04:18:05.015217 [ERR] Health check failed: 1 pools full
(POOL_FULL)
2021-09-15 04:13:45.312115 [WRN] Health check failed: 1 pools
nearfull (POOL_NEAR_FULL)
During this time, we are running snapshot rotation on RBD images.
Could this have anything to do with it?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 13 September 2021 12:20
To: ceph-users
Subject: Health check failed: 1 pools ful
Hi all,
I recently had a strange blip in the ceph logs:
2021-09-09 04:19:09.612111 [INF] Health check cleared:
POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-09 04:13:18.187602 [INF] Health check cleared: POOL_FULL
(was: 1 pools full)
2021-09-09 04:13:18.187566 [WRN] Health check failed: 1 pools
nearfull (POOL_NEAR_FULL)
2021-09-09 04:12:09.078878 [INF] Health check cleared:
POOL_NEAR_FULL (was: 1 pools nearfull)
2021-09-09 04:12:09.078850 [ERR] Health check failed: 1 pools full
(POOL_FULL)
2021-09-09 04:08:16.898112 [WRN] Health check failed: 1 pools
nearfull (POOL_NEAR_FULL)
None of our pools are anywhere near full or close to their quotas:
# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
11 PiB 9.6 PiB 1.8 PiB 16.11 845.1 M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES
USED %USED MAX AVAIL OBJECTS DIRTY READ
WRITE RAW USED
sr-rbd-meta-one 1 N/A 500 GiB
90 GiB 0.21 41 TiB 31558 31.56 k 799
MiB 338 MiB 270 GiB
sr-rbd-data-one 2 N/A 70 TiB
36 TiB 27.96 93 TiB 13966792 13.97 M 4.2
GiB 2.5 GiB 48 TiB
sr-rbd-one-stretch 3 N/A 1 TiB
222 GiB 0.52 41 TiB 68813 68.81 k 863
MiB 860 MiB 667 GiB
con-rbd-meta-hpc-one 7 N/A 10 GiB
51 KiB 0 1.7 TiB 61 61 7.0
MiB 3.8 MiB 154 KiB
con-rbd-data-hpc-one 8 N/A 5 TiB
35 GiB 0 5.9 PiB 9245 9.24 k 144
MiB 78 MiB 44 GiB
sr-rbd-data-one-hdd 11 N/A 200 TiB
118 TiB 39.90 177 TiB 31460630 31.46 M 14
GiB 2.2 GiB 157 TiB
con-fs2-meta1 12 N/A 250 GiB
2.0 GiB 0.15 1.3 TiB 18045470 18.05 M 20
MiB 108 MiB 7.9 GiB
con-fs2-meta2 13 N/A 100 GiB
0 B 0 1.3 TiB 216425275 216.4 M 141
KiB 7.9 MiB 0 B
con-fs2-data 14 N/A 2.0 PiB
1.3 PiB 18.41 5.9 PiB 541502957 541.5 M 4.9
GiB 5.0 GiB 1.7 PiB
con-fs2-data-ec-ssd 17 N/A 1 TiB
239 GiB 5.29 4.2 TiB 3225690 3.23 M 17
MiB 0 B 299 GiB
ms-rbd-one 18 N/A 1 TiB
262 GiB 0.62 41 TiB 73711 73.71 k 4.8
MiB 1.5 GiB 786 GiB
con-fs2-data2 19 N/A 5 PiB
29 TiB 0.52 5.4 PiB 20322725 20.32 M 83
MiB 97 MiB 39 TiB
I'm not sure if IO stopped, it does not look like it. The blip might
have been artificial. I could not find any information about which
pool(s) was causing this.
We are running ceph version 13.2.10
(564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).
Any ideas what is going on or if this could be a problem?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx