Re: Disks are filling up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



TL;DR: We could not fix this problem in the end and ended up with a Ceph fs in read only mode (so we could only backup, delete and restore) and one broken OSD (we deleted that and restored to a "new disk")

I can now wrap up my whole experience with this problem.

After the OSD usage growing to almost 2 TB x 3 OSDs (for a data that 'du' counted to be about 120GB), Ceph stopped filling up and in the week or two that followed,
most of the space that was used showed up as free again.
But there was one OSD that did not free up to meaningful values. To my surprise it was an OSD that is backed by SSDs not the one on HDD.

It seems the biggest contributing factor was that I created the pools for the Ceph fs with autoscaling set to on (the default in cephadm dashboard). Now this pool never grew to more than 1 PG although it had a little over 100 GB on it.

From what I read on this list this alone is prone to lock contention and other problematic behavior.

Lessons learned:
* If you use pool autoscaling, watch it if it actually does that.
  -> I opted for setting the new PG number for the replacement pools to 32 manually. * Ceph fs has a read only mode that at least lets you back up data in some bad states.   -> That is good to know. It allows at least administrators to copy data to other storage devices. * If you use Ceph fs for persistent volumes in Kubernetes be aware that you probably loose all volumes at the same time when Ceph fs switches to read only.   The CSI for Ceph fs does not work on read only Ceph fs, it always writes xattr on mount (the data pool that should be used and other internal data) and gives up if that fails.   -> use a reasonable number of Ceph fs for Kubernetes persistent storage so you don't loose all PV at once.

Other problems with my configuration I found:
* We suffer from VMWare ESXi misreporting the type of disk physically attached. It reports some SAS SSDs as HDD. We also have real SAS HDD attached to some node.    I suspected that to be a problem and we will exchange the HDDs for SSDs soon but the big problem was that the disktype was in the crushmap.   -> Edited the crush map to ignore the type of storage as it is not very meaningful in our setup anyway * I had PGs stuck in undersized state for a long time and could not understand why ceph does not fix it.   Then I checked the OSD weights (reweights) again and they were set to different values (1 and  0.85).   After setting it to one on all OSDs Ceph actually started to bring all PG into the active+clean state.   -> If all the OSDs actually are the same size i will either not reweight in the future or set it to the same value on all OSDs probably 1

So now after a noticable downtime of kubernetes and having to recreate most persistent volumes on the cluster Ceph health is HEALTH_OK again.

I could upgrade to Ceph 16.2.13.
I hope I can now upgrade to 17.2.6 without issues.

Best regards

--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for disabled persons
Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.siam@xxxxxxxxxx  |www.oeaw.ac.at/acdh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux