I like John's idea of getting configuration specific to a class from the monitors, but I think I've thought of a situation where it would be desirable to have a local configuration like Wido's. On some of the really high end flash configurations we have two network adaptors, each on their own NUMA node, with a distinct IP address. This is an operations nightmare, because each OSD needs it's own definition that points to the appropriate IP. [class.numa.1] cluster_network = 10.1.0.0/24 [class.numa.2] cluster_network = 10.1.1.0/24 Maybe it doesn't make sense for "classes" to be used this way, because I can't think of a reason you would want a pool to only use the left half of every machine, but using the "classes" as a form of "label" could be make this sort of configuration more approachable. [label.numa.1] cluster_network = 10.1.0.0/24 [label.numa.2] cluster_network = 10.1.1.0/24 Perhaps we have a pluggable system for applying "labels" to OSDs, and the "class" of an OSD is dictated by possession of some combination of labels. Example labels: mfg: [intel|samsung|sandisk|micron] numa: [1,2] bus: [sata,sas,nvme] rotational: [0,1] type: [rust,2dnand,3dnand,xpoint] over_provisioning: [1.1,1.2,1.3] Then you could create a "gpssd" classifier that includes OSDs with: bus = sas rotational = 0 And a "piops" classifier that includes OSDs with: bus: nvme over_provisioning: 1.3 On Fri, Feb 3, 2017 at 2:52 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > >> Op 2 februari 2017 om 21:57 schreef Sage Weil <sage@xxxxxxxxxxxx>: >> >> >> Hi everyone, >> >> I made more updates to http://pad.ceph.com/p/crush-types after the CDM >> discussion yesterday: >> >> - consolidated notes into a single proposal >> - use otherwise illegal character (e.g., ~) as separater for generated >> buckets. This avoids ambiguity with user-defined buckets. >> - class-id $class $id properties for each bucket. This allows us to >> preserve the derivative bucket ids across a decompile->compile cycle so >> that data does not move (the bucket id is one of many inputs into crush's >> hash during placement). >> - simpler rule syntax: >> >> rule ssd { >> ruleset 1 >> step take default class ssd >> step chooseleaf firstn 0 type host >> step emit >> } >> >> My rationale here is that we don't want to make this a separate 'step' >> call since steps map to underlying crush rule step ops, and this is a >> directive only to the compiler. Making it an optional step argument seems >> like the cleanest way to do that. >> >> Any other comments before we kick this off? >> > > No, looks good to me! Like combining the class into the 'step'. > > Would be very nice to have this in L! > > What would be interesting as well is if OSD daemons could somehow access this while parsing their configuration. > > Eg > > [class.ssd] > osd_op_threads = 16 > > [class.hdd] > osd_max_backfills = 1 > > That way you can keep configuration generic and makes config management a lot easier. > > Wido > >> Thanks! >> sage >> >> >> On Mon, 23 Jan 2017, Loic Dachary wrote: >> >> > Hi Wido, >> > >> > Updated http://pad.ceph.com/p/crush-types with your proposal for the rule syntax >> > >> > Cheers >> > >> > On 01/23/2017 03:29 PM, Sage Weil wrote: >> > > On Mon, 23 Jan 2017, Wido den Hollander wrote: >> > >>> Op 22 januari 2017 om 17:44 schreef Loic Dachary <loic@xxxxxxxxxxx>: >> > >>> >> > >>> >> > >>> Hi Sage, >> > >>> >> > >>> You proposed an improvement to the crush map to address different device types (SSD, HDD, etc.)[1]. When learning how to create a crush map, I was indeed confused by the tricks required to create SSD only pools. After years of practice it feels more natural :-) >> > >>> >> > >>> The source of my confusion was mostly because I had to use a hierarchical description to describe something that is not organized hierarchically. "The rack contains hosts that contain devices" is intuitive. "The rack contains hosts that contain ssd that contain devices" is counter intuitive. Changing: >> > >>> >> > >>> # devices >> > >>> device 0 osd.0 >> > >>> device 1 osd.1 >> > >>> device 2 osd.2 >> > >>> device 3 osd.3 >> > >>> >> > >>> into: >> > >>> >> > >>> # devices >> > >>> device 0 osd.0 ssd >> > >>> device 1 osd.1 ssd >> > >>> device 2 osd.2 hdd >> > >>> device 3 osd.3 hdd >> > >>> >> > >>> where ssd/hdd is the device class would be much better. However, using the device class like so: >> > >>> >> > >>> rule ssd { >> > >>> ruleset 1 >> > >>> type replicated >> > >>> min_size 1 >> > >>> max_size 10 >> > >>> step take default:ssd >> > >>> step chooseleaf firstn 0 type host >> > >>> step emit >> > >>> } >> > >>> >> > >>> looks arcane. Since the goal is to simplify the description for the first time user, maybe we could have something like: >> > >>> >> > >>> rule ssd { >> > >>> ruleset 1 >> > >>> type replicated >> > >>> min_size 1 >> > >>> max_size 10 >> > >>> device class = ssd >> > >> >> > >> Would that be sane? >> > >> >> > >> Why not: >> > >> >> > >> step set-class ssd >> > >> step take default >> > >> step chooseleaf firstn 0 type host >> > >> step emit >> > >> >> > >> Since it's a 'step' you take, am I right? >> > > >> > > Good idea... a step is a cleaner way to extend the syntax! >> > > >> > > sage >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > >> > >> > -- >> > Loïc Dachary, Artisan Logiciel Libre >> > -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Kyle Bader -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html