Hi Jakub, Comments inline. On Tue, Jul 25, 2023 at 11:03 PM Jakub Petrzilka <jakub.petrzilka@xxxxxxxxx> wrote: > Hello everyone! > > Recently we had a very nasty incident with one of our CEPH storages. > > During basic backfill recovery operation due to faulty disk CephFS > metadata started growing exponentially until they used all available space > and whole cluster DIED. Usage graph screenshot in attachment. > Missed attaching screenshot? So there were 12 * 240g SSD disks backing the metadata pool, one of these disks failed? Could you please share the recovery steps you did after the faulty disk ? > Everything was very fast and even when the OSDs were marked full they > tripped failsafe and ate all the free blocks, still trying to allocate > space and completely died without possibility to even start them again. > You mean to say that the size of the mds metadata pool grew exponentially than the allocated size and mds process eventually died ? > Only solution was to copy whole bluestore to bigger SSD and resize > underlying BS device. Just about 1/3 was able to start after moving but it > was enough since we have very redundant settings for cephfs metadata. > Basically metadata were moved from 12x 240g SSD to 12x 500GB SSD to have > enough space to start again. > > Brief info about the cluster: > - CephFS data are stored on ~500x 8TB SAS HDD using 10+2 ECC in 18 hosts. > - CephFS metadata are stored on ~12x 500GB SAS/SATA SSD using 5x > replication on 6 hosts. > - Version was one of the latest 16.x.x Pacific at the time of the incident. > - 3x Mon+mgr and 2 active and 2 hot standby MDS are on separate virtual > servers. > - typical file size to be stored is from hundreds of MBs to tens of GBs. > - this cluster is not the biggest, not having the most HDDs, no special > config, I simply see nothing special about this cluster. > > During investigation I found out the following: > - Metadata are outgrowing any time recovery is running on any of > maintained clusters (~15 clusters of different usages and sizes) but not > this much, this was an extreme situation. > - after recovery finish size went fine again. > - i think there is slight correlation with recovery width (objects to be > touched by recovery in order to recovery everything) and recovery (time) > length. But i have no proof. > - nothing much else > > I would like to find out why this happened because i think this can happen > again sometime and someone might lose some data if they have less luck. > Any ideas are appreciated, or even info if anyone have seen any similar > behavior or if i am the only one struggling with issue like this :) > > Kind regards, > > Jakub Petrzilka > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > Thanks and Regards, Kotresh H R _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx