Re: All pools full after one OSD got OSD_FULL state

Vladimir Prokofev <v@xxxxxxxxxxx> · Mon, 5 Mar 2018 15:10:40 +0300

I'll pitch in my personal expirience.

When single OSD in a pool becomes full(95% used), then all client 

IO

writes to this pool must stop, even if other OSDs are almost free. This is done for the purpose of data intergity. [1]
To avoid this you need to balance your failure domains.
For example, assuming replicated pool size = 2, if one of your failure domains has a weight of 10, and the other has a weight of 3 - you're screwed. CEPH has to have a copy in both failure domains, and when second failure domain nears its capacity, first will still have more than 70% free storage.
It's easy to calculate and predict cluster storage capacity when all your failure domains are of the same weight, and their number is even to your replication size, for example if your size = 3, and you have 3,6,9,12, etc, failure domains of the same weight. It becomes not so easy when your failure domains are of different weight, and an odd number to your replicated pool size. This may be even more compiicated with EC pools, but I don't use them, so no expirience there.
So what I learned is that you should build your cluster evenly, without heavy imbalance in weights(and IOPS for that matter, if you don't want to get slow requests), or you will regularly come to a situation where a single OSD is in near_full status, while cluster reports terabytes of free storage.

[1] http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space

2018-03-05 11:15 GMT+03:00 Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx>:
One full OSD has caused that all pools got full. Can anyone help me understand this ?

During ongoing PGs backfilling I see that MAX AVAIL values are changing when USED values are constant.

GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED     425T      145T         279T         65.70 
POOLS:
    NAME                           ID     USED       %USED     MAX AVAIL     OBJECTS  
    volumes                        3      41011G     91.14         3987G     10520026 
    default.rgw.buckets.data       20       105T     93.11         7974G     28484000 

GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED     425T      146T         279T         65.66 
POOLS:
    NAME                           ID     USED       %USED     MAX AVAIL     OBJECTS  
    volumes                        3      41013G     88.66         5246G     10520539 
    default.rgw.buckets.data       20       105T     91.13        10492G     28484000

From what I can read in docs The MAX AVAIL value is a complicated function of the replication or erasure code used, the CRUSH rule that maps storage to devices, the utilization of those devices, and the configured mon_osd_full_ratio.

Any clue what more I can do to make better use of available raw storage ? Increase number of PGs for better balanced OSDs utilization ?

Thanks
Jakub

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com