Re: Health check failed: 1 pools ful

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank,

I think the snapshot rotation could be an explanation.
Just a few days ago we had a host failure over night and some OSDs couldn't be rebalanced entirely because they were too full. Deleting a few (large) snapshots I created last week resolved the issue. If you monitored 'ceph osd df' for a couple of days you should probably see spikes in the OSD usage stats. The only difference I see is that we also had 'OSD nearfull' warnings which you don't seem to have, so it might be something else.


Zitat von Frank Schilder <frans@xxxxxx>:

It happened again today:

2021-09-15 04:25:20.551098 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-15 04:19:01.512425 [INF] Health check cleared: POOL_FULL (was: 1 pools full) 2021-09-15 04:19:01.512389 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) 2021-09-15 04:18:05.015251 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-15 04:18:05.015217 [ERR] Health check failed: 1 pools full (POOL_FULL) 2021-09-15 04:13:45.312115 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL)

During this time, we are running snapshot rotation on RBD images. Could this have anything to do with it?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 13 September 2021 12:20
To: ceph-users
Subject:  Health check failed: 1 pools ful

Hi all,

I recently had a strange blip in the ceph logs:

2021-09-09 04:19:09.612111 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-09 04:13:18.187602 [INF] Health check cleared: POOL_FULL (was: 1 pools full) 2021-09-09 04:13:18.187566 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL) 2021-09-09 04:12:09.078878 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-09 04:12:09.078850 [ERR] Health check failed: 1 pools full (POOL_FULL) 2021-09-09 04:08:16.898112 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL)

None of our pools are anywhere near full or close to their quotas:

# ceph df detail
GLOBAL:
    SIZE       AVAIL       RAW USED     %RAW USED     OBJECTS
    11 PiB     9.6 PiB      1.8 PiB         16.11     845.1 M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED sr-rbd-meta-one 1 N/A 500 GiB 90 GiB 0.21 41 TiB 31558 31.56 k 799 MiB 338 MiB 270 GiB sr-rbd-data-one 2 N/A 70 TiB 36 TiB 27.96 93 TiB 13966792 13.97 M 4.2 GiB 2.5 GiB 48 TiB sr-rbd-one-stretch 3 N/A 1 TiB 222 GiB 0.52 41 TiB 68813 68.81 k 863 MiB 860 MiB 667 GiB con-rbd-meta-hpc-one 7 N/A 10 GiB 51 KiB 0 1.7 TiB 61 61 7.0 MiB 3.8 MiB 154 KiB con-rbd-data-hpc-one 8 N/A 5 TiB 35 GiB 0 5.9 PiB 9245 9.24 k 144 MiB 78 MiB 44 GiB sr-rbd-data-one-hdd 11 N/A 200 TiB 118 TiB 39.90 177 TiB 31460630 31.46 M 14 GiB 2.2 GiB 157 TiB con-fs2-meta1 12 N/A 250 GiB 2.0 GiB 0.15 1.3 TiB 18045470 18.05 M 20 MiB 108 MiB 7.9 GiB con-fs2-meta2 13 N/A 100 GiB 0 B 0 1.3 TiB 216425275 216.4 M 141 KiB 7.9 MiB 0 B con-fs2-data 14 N/A 2.0 PiB 1.3 PiB 18.41 5.9 PiB 541502957 541.5 M 4.9 GiB 5.0 GiB 1.7 PiB con-fs2-data-ec-ssd 17 N/A 1 TiB 239 GiB 5.29 4.2 TiB 3225690 3.23 M 17 MiB 0 B 299 GiB ms-rbd-one 18 N/A 1 TiB 262 GiB 0.62 41 TiB 73711 73.71 k 4.8 MiB 1.5 GiB 786 GiB con-fs2-data2 19 N/A 5 PiB 29 TiB 0.52 5.4 PiB 20322725 20.32 M 83 MiB 97 MiB 39 TiB

I'm not sure if IO stopped, it does not look like it. The blip might have been artificial. I could not find any information about which pool(s) was causing this.

We are running ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).

Any ideas what is going on or if this could be a problem?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux