Thank you Dan and Sebastian for trying to help me.
We managed to get back to a normal situation but we still didn't
understood how the problem happened...
How do we get back to an optimal situation ?
"Fortunately", we had on the cluster, 3 other "spare" NVMe that we
didn't use yet. We added them in the metadata pool to spread the
metadata data. Once this done, there was no still OSD full but mds were
in failed state (1 up, 1 replay and 1 failed). There was a very
significant trimming activity, and when finished, we restarted one of
the mds serevr, then the MDS status was OK (active/active/standby).
After that, the occupation of metadata OSD decreased to get back to a
normal amount (close to 3%...) ! Ok that was really fine but...
...what happened at the beginning ? (and crucial issue: how can we be
sure that it will not happen again ?)
no answer yet!
We do not have explanation why, in few hours, metadata pool has grown to
100% (without specific activities in data pool)
Actually, indeed, the Ceph log size of today is huge (comparing to other
day).
The today's mds log seems to show something unusual at 04:10 am (see
here: https://pastebin.com/0CCdLMat)
We currently run a Nautilus 14.2.16.
We quickly plan to update it to the latest version of Nautilus 14.2.21
and after to upgrade to a newer Ceph release (Octopus, or even Pacific ?)
If you are inspired by this issue, don't hesitate to comment, thanks.
Regards,
Hervé
Le 01/06/2021 à 12:24, Hervé Ballans a écrit :
Hi all,
Ceph Nautilus 14.2.16.
We encounter a strange and critical poblem since this morning.
Our cephfs metadata pool suddenly grew from 2,7% to 100%! (in less
than 5 hours) while there is no significant activities on the OSD data !
Here are some numbers:
# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 205 TiB 103 TiB 102 TiB 102 TiB 49.68
nvme 4.4 TiB 2.2 TiB 2.1 TiB 2.2 TiB 49.63
TOTAL 210 TiB 105 TiB 104 TiB 104 TiB 49.68
POOLS:
POOL ID PGS STORED OBJECTS
USED %USED MAX AVAIL
cephfs_data_home 7 512 11 TiB 22.58M 11
TiB 18.31 17 TiB
cephfs_metadata_home 8 128 724 GiB 2.32M 724
GiB 100.00 0 B
rbd_backup_vms 9 1024 19 TiB 5.00M 19
TiB 37.08 11 TiB
The cephfs_data uses less than the half of the storage space, and
there was no significant increase during the period (and before) where
metadata became full.
Is someone already encounter that ?
Currently, I have no idea how I can solve this problem. The restart of
associated OSD and mds services have not been useful.
Let me know if you want more informations or logs.
Thank you for your help.
Regards,
Hervé
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx