Re: CephFS metadata outgrow DISASTER during recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jakub,

Comments inline.

On Tue, Jul 25, 2023 at 11:03 PM Jakub Petrzilka <jakub.petrzilka@xxxxxxxxx>
wrote:

> Hello everyone!
>
> Recently we had a very nasty incident with one of our CEPH storages.
>
> During basic backfill recovery operation due to faulty disk CephFS
> metadata started growing exponentially until they used all available space
> and whole cluster DIED. Usage graph screenshot in attachment.
>

Missed attaching screenshot?
So there were 12 * 240g SSD disks backing the metadata pool, one of these
disks failed?
Could you please share the recovery steps you did after the faulty disk ?


> Everything was very fast and even when the OSDs were marked full they
> tripped failsafe and ate all the free blocks, still trying to allocate
> space and completely died without possibility to even start them again.
>

You mean to say that the size of the mds metadata pool grew exponentially
than the allocated size and mds process eventually died ?


> Only solution was to copy whole bluestore to bigger SSD and resize
> underlying BS device. Just about 1/3 was able to start after moving but it
> was enough since we have very redundant settings for cephfs metadata.
> Basically metadata were moved from 12x 240g SSD to 12x 500GB SSD to have
> enough space to start again.
>
> Brief info about the cluster:
> - CephFS data are stored on ~500x 8TB SAS HDD using 10+2 ECC in 18 hosts.
> - CephFS metadata are stored on ~12x 500GB SAS/SATA SSD using 5x
> replication on 6 hosts.
> - Version was one of the latest 16.x.x Pacific at the time of the incident.
> - 3x Mon+mgr and 2 active and 2 hot standby MDS are on separate virtual
> servers.
> - typical file size to be stored is from hundreds of MBs to tens of GBs.
> - this cluster is not the biggest, not having the most HDDs, no special
> config, I simply see nothing special about this cluster.
>
> During investigation I found out the following:
> - Metadata are outgrowing any time recovery is running on any of
> maintained clusters (~15 clusters of different usages and sizes) but not
> this much, this was an extreme situation.
> - after recovery finish size went fine again.
> - i think there is slight correlation with recovery width (objects to be
> touched by recovery in order to recovery everything) and recovery (time)
> length. But i have no proof.
> - nothing much else
>
> I would like to find out why this happened because i think this can happen
> again sometime and someone might lose some data if they have less luck.
> Any ideas are appreciated, or even info if anyone have seen any similar
> behavior or if i am the only one struggling with issue like this :)
>
> Kind regards,
>
> Jakub Petrzilka
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
Thanks and Regards,
Kotresh H R
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux