Re: Cephfs metadta pool suddenly full (100%) ! [SOLVED but no explanation at this time!]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you Dan and Sebastian for trying to help me.

We managed to get back to a normal situation but we still didn't understood how the problem happened...

How do we get back to an optimal situation ?
"Fortunately", we had on the cluster, 3 other "spare" NVMe that we didn't use yet. We added them in the metadata pool to spread the metadata data. Once this done, there was no still OSD full but mds were in failed state (1 up, 1 replay and 1 failed). There was a very significant trimming activity, and when finished, we restarted one of the mds serevr, then the MDS status was OK (active/active/standby). After that, the occupation of metadata OSD decreased to get back to a normal amount (close to 3%...) ! Ok that was really fine but...

...what happened at the beginning ? (and crucial issue: how can we be sure that it will not happen again ?)
no answer yet!
We do not have explanation why, in few hours, metadata pool has grown to 100% (without specific activities in data pool)

Actually, indeed, the Ceph log size of today is huge (comparing to other day).

The today's mds log seems to show something unusual at 04:10 am (see here: https://pastebin.com/0CCdLMat)

We currently run a Nautilus 14.2.16.
We quickly plan to update it to the latest version of Nautilus 14.2.21 and after to upgrade to a newer Ceph release (Octopus, or even Pacific ?)

If you are inspired by this issue, don't hesitate to comment, thanks.

Regards,
Hervé

Le 01/06/2021 à 12:24, Hervé Ballans a écrit :
Hi all,

Ceph  Nautilus 14.2.16.

We encounter a strange and critical poblem since this morning.

Our cephfs metadata pool suddenly grew from 2,7% to 100%! (in less than 5 hours) while there is no significant activities on the OSD data !

Here are some numbers:

# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED %RAW USED
    hdd       205 TiB     103 TiB     102 TiB      102 TiB 49.68
    nvme      4.4 TiB     2.2 TiB     2.1 TiB      2.2 TiB 49.63
    TOTAL     210 TiB     105 TiB     104 TiB      104 TiB 49.68

POOLS:
    POOL                     ID     PGS      STORED OBJECTS USED        %USED      MAX AVAIL     cephfs_data_home          7      512      11 TiB 22.58M 11 TiB      18.31        17 TiB     cephfs_metadata_home      8      128     724 GiB 2.32M     724 GiB     100.00           0 B     rbd_backup_vms            9     1024      19 TiB 5.00M      19 TiB      37.08        11 TiB


The cephfs_data uses less than the half of the storage space, and there was no significant increase during the period (and before) where metadata became full.

Is someone already encounter that ?

Currently, I have no idea how I can solve this problem. The restart of associated OSD and mds services have not been useful.

Let me know if you want more informations or logs.

Thank you for your help.

Regards,
Hervé


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux