I'll pitch in my personal expirience.
When single OSD in a pool becomes full(95% used), then all client
IO
writes to this pool must stop, even if other OSDs are almost free. This is done for the purpose of data intergity. [1]
To avoid this you need to balance your failure domains.
For example, assuming replicated pool size = 2, if one of your failure domains has a weight of 10, and the other has a weight of 3 - you're screwed. CEPH has to have a copy in both failure domains, and when second failure domain nears its capacity, first will still have more than 70% free storage.
It's easy to calculate and predict cluster storage capacity when all your failure domains are of the same weight, and their number is even to your replication size, for example if your size = 3, and you have 3,6,9,12, etc, failure domains of the same weight. It becomes not so easy when your failure domains are of different weight, and an odd number to your replicated pool size. This may be even more compiicated with EC pools, but I don't use them, so no expirience there.
So what I learned is that you should build your cluster evenly, without heavy imbalance in weights(and IOPS for that matter, if you don't want to get slow requests), or you will regularly come to a situation where a single OSD is in near_full status, while cluster reports terabytes of free storage.
2018-03-05 11:15 GMT+03:00 Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>:
One full OSD has caused that all pools got full. Can anyone help me understand this ?During ongoing PGs backfilling I see that MAX AVAIL values are changing when USED values are constant.GLOBAL:SIZE AVAIL RAW USED %RAW USED425T 145T 279T 65.70POOLS:NAME ID USED %USED MAX AVAIL OBJECTSvolumes 3 41011G 91.14 3987G 10520026default.rgw.buckets.data 20 105T 93.11 7974G 28484000GLOBAL:SIZE AVAIL RAW USED %RAW USED425T 146T 279T 65.66POOLS:NAME ID USED %USED MAX AVAIL OBJECTSvolumes 3 41013G 88.66 5246G 10520539default.rgw.buckets.data 20 105T 91.13 10492G 28484000
From what I can read in docs The MAX AVAIL value is a complicated function of the replication or erasure code used, the CRUSH rule that maps storage to devices, the utilization of those devices, and the configured mon_osd_full_ratio.Any clue what more I can do to make better use of available raw storage ? Increase number of PGs for better balanced OSDs utilization ?ThanksJakub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com