On 11/1/16, 1:45 PM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote: >On Tue, 1 Nov 2016, Stillwell, Bryan J wrote: >> I recently learned that 'MAX AVAIL' in the 'ceph df' output doesn't >> represent what I thought it did. It actually represents the amount of >> data that can be used before the first OSD becomes full, and not the sum >> of all free space across a set of OSDs. This means that balancing the >> data with 'ceph osd reweight' will actually increase the value of 'MAX >> AVAIL'. >> >> Knowing this I would like to graph both 'MAX AVAIL' and the total free >> space across two different sets of OSDs so I can get an idea how out of >> balance the cluster is. >> >> This is where I'm running into trouble. I have two different types of >> Ceph nodes in my cluster. One with all HDDs+SSD journals, and the other >> with all SSDs using co-located journals. There isn't any cache tiering >> going on, so a pool either uses the all-HDD root, or the all-SSD root, >>but >> not both. >> >> The only method I can think of to get this information is to walk the >> CRUSH tree to figure out which OSDs are under a given root, and then use >> the output of 'ceph osd df -f json' to sum up the free space of each >>OSD. >> Is there a better way? > >Try > > ceph osd df tree -f json-pretty > >I think that'll give you all the right fields you need to sum. > >I wonder if this is something we should be reporting elsewhere, though? >Summing up all free space is one thing. Doing it per CRUSH hierarchy is >something else. Maybe the 'ceph osd df tree' output could have a field >summing freespace for self + children in the json dump only... That's just what I was looking for! It also looks like the regular 'ceph osd df tree' output has this information too: # ceph osd df tree ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME -8 0.52199 - 521G 61835M 461G 11.57 0.57 root ceph-ssd -5 0.17400 - 173G 20615M 153G 11.58 0.57 host dev-ceph-ssd-001 9 0.05800 1.00000 59361M 5374M 53987M 9.05 0.45 osd.9 10 0.05800 1.00000 59361M 6837M 52524M 11.52 0.57 osd.10 11 0.05800 1.00000 59361M 8404M 50957M 14.16 0.70 osd.11 -6 0.17400 - 173G 20615M 153G 11.58 0.57 host dev-ceph-ssd-002 12 0.05800 1.00000 59361M 7165M 52196M 12.07 0.60 osd.12 13 0.05800 1.00000 59361M 6762M 52599M 11.39 0.56 osd.13 14 0.05800 1.00000 59361M 6688M 52673M 11.27 0.56 osd.14 -7 0.17400 - 173G 20604M 153G 11.57 0.57 host dev-ceph-ssd-003 15 0.05800 1.00000 59361M 8189M 51172M 13.80 0.68 osd.15 16 0.05800 1.00000 59361M 4835M 54526M 8.15 0.40 osd.16 17 0.05800 1.00000 59361M 7579M 51782M 12.77 0.63 osd.17 -1 0.57596 - 575G 161G 414G 27.97 1.39 root ceph-hdd -2 0.19199 - 191G 49990M 143G 25.44 1.26 host dev-ceph-hdd-001 0 0.06400 0.75000 65502M 15785M 49717M 24.10 1.19 osd.0 1 0.06400 0.64999 65502M 17127M 48375M 26.15 1.30 osd.1 2 0.06400 0.59999 65502M 17077M 48425M 26.07 1.29 osd.2 -3 0.19199 - 191G 63885M 129G 32.51 1.61 host dev-ceph-hdd-002 3 0.06400 1.00000 65502M 28681M 36821M 43.79 2.17 osd.3 4 0.06400 0.59999 65502M 17246M 48256M 26.33 1.30 osd.4 5 0.06400 0.84999 65502M 17958M 47544M 27.42 1.36 osd.5 -4 0.19199 - 191G 51038M 142G 25.97 1.29 host dev-ceph-hdd-003 6 0.06400 0.64999 65502M 16617M 48885M 25.37 1.26 osd.6 7 0.06400 0.70000 65502M 16391M 49111M 25.02 1.24 osd.7 8 0.06400 0.64999 65502M 18029M 47473M 27.52 1.36 osd.8 TOTAL 1097G 221G 876G 20.18 MIN/MAX VAR: 0.40/2.17 STDDEV: 9.68 As you can tell I set the weights so that osd.3 would make the MAX AVAIL difference more pronounced. Also it appears like VAR is calculated on the whole cluster instead of each root. Thanks! Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com