Given the current status and configuration of a ceph cluster, how can I determine the amount of data that may be written to each pool before it becomes full? For this calculation we can assume that no further data is written to any other pool. Or before any OSD the pool is mapped to becomes full as that will render the pool full.
ceph df / rados df supply totals for available/used capacity, but this doesn't indicate how much more data can be written to a specific pool.
This figure needs to be pool specific as pools can have different replica counts meaning that they can contain different amounts of data despite having the same raw capacity available to them.
Also per-pool CRUSH rulesets may limit some pools to a subset of OSDs.
A simple example is three pools one with replica counts of 1,2,3:
root@ctcephadmin:~# ceph osd dump |grep 'test_.*rep size'
pool 3 'test_1r' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 41 owner 18446744073709551615
pool 4 'test_2r' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 18 owner 18446744073709551615
pool 5 'test_3r' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 19 owner 0
I wrote 3x1GB objects to each pool:
root@ctcephadmin:~# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
90713M 59426M 26678M 29.41
POOLS:
NAME ID USED %USED OBJECTS
test_1r 3 3000M 3.31 3
test_2r 4 3000M 3.31 3
test_3r 5 3000M 3.31 3
ceph df / rados df supply totals for available/used capacity, but this doesn't indicate how much more data can be written to a specific pool.
This figure needs to be pool specific as pools can have different replica counts meaning that they can contain different amounts of data despite having the same raw capacity available to them.
Also per-pool CRUSH rulesets may limit some pools to a subset of OSDs.
A simple example is three pools one with replica counts of 1,2,3:
root@ctcephadmin:~# ceph osd dump |grep 'test_.*rep size'
pool 3 'test_1r' rep size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 41 owner 18446744073709551615
pool 4 'test_2r' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 18 owner 18446744073709551615
pool 5 'test_3r' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300 last_change 19 owner 0
I wrote 3x1GB objects to each pool:
root@ctcephadmin:~# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
90713M 59426M 26678M 29.41
POOLS:
NAME ID USED %USED OBJECTS
test_1r 3 3000M 3.31 3
test_2r 4 3000M 3.31 3
test_3r 5 3000M 3.31 3
<other pools snipped>
So "USED" here seems to be misnamed, to me it would make sense if USED was 3GB,6GB,9GB representing the space required to store the data+replicas in each pool. The USED column could be described as STORED, as it represents the amount of information stored in each pool, not how much raw capacity is used by that pool.
So "USED" here seems to be misnamed, to me it would make sense if USED was 3GB,6GB,9GB representing the space required to store the data+replicas in each pool. The USED column could be described as STORED, as it represents the amount of information stored in each pool, not how much raw capacity is used by that pool.
So back to my original question, how can I see/calculate how much more data I can store in each pool before the pool is full? (Assuming no data is written to other pools). I don't want to simply divide available space by replica count as the differing CRUSH ruleset scenario would make that calculation much more complicated.
Any insight appreciated, Thanks!
--
Hugh Saunders
Hugh Saunders
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com