So I wanted to report a crush rule/ec profile strange behaviour regarding radosgw items which i am not sure if it's a bug or it's supposed to work that way. I am trying to implement the below scenario in my home lab: By default there is a "default" erasure-code-profile with the below settings: crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 >From the above we see that it uses the root bucket. Now ofcourse you would want to create your own ec-profile with custom algorithm/crush buckets etc Let's say for example we create two new ec profiles. One with specific crush-root = ssd-performance2 and one with the crush-root=default (there are no disks there according to ceph osd tree-> end of page) ceph osd erasure-code-profile set test-ec crush-device-class= crush-failure-domain=host crush-root=ssd-performance2 jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 ceph osd erasure-code-profile set test-ec2 crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 Now let's create the associated crush rules to use these profiles: ceph osd crush rule create-erasure erasure-test-rule1 test-ec ceph osd crush rule create-erasure erasure-test-rule2 test-ec2 Now let's say you have a radosgw server that has started and by default it creates the 5 default radosgwpools(supposed you have uploaded some data as well): default.rgw.buckets.data default.rgw.buckets.index default.rgw.control default.rgw.log default.rgw.meta Now if you grep these pools with ceph osd dump you will see that all of them are using replicated rules but we want to use erasure for the radosgw data pool. So let's migrate the default.rgw.buckets.data pool to a erasure-coded one. 1) We shutdown the radosgw-server so that we don't allow any requests coming in. 2) ceph osd pool rename default.rgw.buckets.data default.rgw.buckets.data-old 3) ceph osd pool create default.rgw.buckets.data 8 8 erasure test-ec erasure-test-rule - > We use the newly created erasure crush rule with the profile we created and use the ssd-performance2 root bucket 4) rados cppool default.rgw.buckets.data-old default.rgw.buckets.data 5) Start radosgw server again At this point i can see the old objects and i can upload new objects in radosgw and everything is working fine. Now i see this strange behavior after i do the below: We set the default.rgw.buckets.data to use the other erasure crush rule (This is using the root bucket=default which doesn't have any disks): ceph osd pool set default.rgw.buckets.data crush_rule erasure-test-rule2 Bug1? You could still browse the data but any attempt to upload/download hangs there with the below log messages: 2019-12-18 17:07:07.037 7f05a1ece700 0 ERROR: client_io->complete_request() returned Input/output error 2019-12-18 17:07:07.037 7f05a1ece700 2 req 712 0.004s s3:list_buckets op status= Monitor nodes don't display anything and seems that new items cannot be saved (which is correct as it doesn't know where to save them) but at least Monitor nodes should display something as a warning or there must be crush check before to see if the rule can be applied? Reverting back the rule to erasure-test-rule works fine again ================================= Bug 2? If you modify the erasure-test-rule profile to use a null crush bucket (like erasure-test-rule2) then this is not being parsed and identified by the crush rule. Seems crush rules skips that part Example: ceph osd erasure-code-profile set test-ec crush-root=default --force At this point nothing happens and radosgw is working fine. Which it shouldn't as it should see that the data cannot be saved anywhere. Unless it keeps the crush root bucket from the crush rules and not from the erasure coded profiles...even if you force apply/change it to the erasure profile like above. ================================= Bug 3? You don't know which rule is using which erasure-code-profile from ceph osd dump. You only see that this pool is using crush rule number 1 but if you dump this crush rule it doesn't mention which erasure-code profile is using, other than which item_name eg = root bucket Even with the telemetry on with latest release and if you do "ceph telemetry show basic" with below you see there is no crush-root being mentioned. So is the crush rule > erasure_code_profile regarding parsing of the crush_root buckets? { "min_size": 2, "erasure_code_profile": { "crush-failure-domain": "host", "k": "2", "technique": "reed_sol_van", "m": "1", "plugin": "jerasure" }, "pg_autoscale_mode": "warn", "pool": 860, "size": 3, "cache_mode": "none", "target_max_objects": 0, "pg_num": 8, "pgp_num": 8, "target_max_bytes": 0, "type": "erasure" } root@ceph-mon01:~# ceph osd crush rule dump erasure-test-rule { "rule_id": 2, "rule_name": "erasure-test-rule", "ruleset": 2, "type": 3, "min_size": 3, "max_size": 3, "steps": [ { "op": "set_chooseleaf_tries", "num": 5 }, { "op": "set_choose_tries", "num": 100 }, { "op": "take", "item": -2, "item_name": "ssd-performance2" }, { "op": "chooseleaf_indep", "num": 0, "type": "host" }, { "op": "emit" } ] } root@ceph-mon01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -37 0.18398 root really-low -40 0.09799 host ceph-osd01-really-low 11 hdd 0.09799 osd.11 up 1.00000 1.00000 -41 0.04799 host ceph-osd02-really-low 1 hdd 0.01900 osd.1 up 1.00000 1.00000 9 hdd 0.02899 osd.9 up 1.00000 1.00000 -42 0.03799 host ceph-osd03-really-low 6 hdd 0.01900 osd.6 up 1.00000 1.00000 7 hdd 0.01900 osd.7 up 1.00000 1.00000 -23 10.67598 root spinning-rust -20 2.04900 rack rack1 -3 2.04900 host ceph-osd01 3 hdd 0.04900 osd.3 up 0.95001 1.00000 22 hdd 1.00000 osd.22 up 0.90002 1.00000 17 ssd 1.00000 osd.17 up 1.00000 1.00000 -25 3.07799 rack rack2 -5 3.07799 host ceph-osd02 4 hdd 0.04900 osd.4 up 1.00000 1.00000 8 hdd 0.02899 osd.8 up 1.00000 1.00000 23 hdd 1.00000 osd.23 up 1.00000 1.00000 25 hdd 1.00000 osd.25 up 1.00000 1.00000 12 ssd 1.00000 osd.12 up 1.00000 1.00000 -28 3.54900 rack rack3 -7 3.54900 host ceph-osd03 0 hdd 1.00000 osd.0 up 0.90002 1.00000 5 hdd 0.04900 osd.5 up 1.00000 1.00000 30 hdd 0.50000 osd.30 up 1.00000 1.00000 21 ssd 1.00000 osd.21 up 0.95001 1.00000 24 ssd 1.00000 osd.24 up 1.00000 1.00000 -55 2.00000 rack rack4 -49 2.00000 host ceph-osd04 26 hdd 1.00000 osd.26 up 1.00000 1.00000 27 hdd 1.00000 osd.27 up 1.00000 1.00000 -2 9.10799 root ssd-performance2 -32 2.09799 host ceph-osd01-ssd 2 ssd 0.09799 osd.2 up 1.00000 1.00000 13 ssd 1.00000 osd.13 up 1.00000 1.00000 16 ssd 1.00000 osd.16 up 1.00000 1.00000 -31 3.00000 host ceph-osd02-ssd 14 ssd 1.00000 osd.14 up 1.00000 1.00000 18 ssd 1.00000 osd.18 up 1.00000 1.00000 19 ssd 1.00000 osd.19 up 1.00000 1.00000 -9 2.00999 host ceph-osd03-ssd 10 ssd 0.00999 osd.10 up 0.90002 1.00000 15 ssd 1.00000 osd.15 up 1.00000 1.00000 20 ssd 1.00000 osd.20 up 1.00000 1.00000 -52 2.00000 host ceph-osd04-ssd 28 ssd 1.00000 osd.28 up 1.00000 1.00000 29 ssd 1.00000 osd.29 up 1.00000 1.00000 -1 0 root default root@ceph-mon01:~# Thanks, Anastasios _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx