Re: Overlapping Roots - How to Fix?

Stefan Kooman <stefan@xxxxxx> · Mon, 23 Sep 2024 17:07:33 +0200

On 23-09-2024 16:31, Janne Johansson wrote:
Den mån 23 sep. 2024 kl 16:23 skrev Stefan Kooman <stefan@xxxxxx>:

On 23-09-2024 16:04, Dave Hall wrote:
Thank you to everybody who has responded to my questions.

At this point I think I am starting to understand.  However, I am still
trying to understand the potential for data loss.

In particular:

     - In some ways it seems that as long as there is sufficient OSD capacity
     available the worst that can happen from a bad crush map is poor placement
     and poor performance. Is this correct?

If you would have a (new) crush rule without any OSD mappings all PGs
for pools that use that rule would go in an inactive state, i.e.
downtime. So when you create a (new) rule you would have to check that
CRUSH can indeed find enough OSDs to comply with the policy you defined.

Are you sure? I have asked some pools to use an "impossible" crush
rule after creation and the PGs only end up as "misplaced". 

Apparently it depends ... You are right with regard to the newly created 
pools and inactive state (at least that's what I have seen in all 
cases).  If I have a pool use a crush rule that does not have any OSDs 
(nvme class rule without OSDs with nvme device class) the PGs become 
"unknown" (not state inactive like I said). But IO for that pool does 
not work at that point (both acting and up sets have "[]p-1", i.e. no 
OSDs available).

However, if I switch back to a valid rule, and then back again to the 
invalid rule, the PGs become "active+clean+remapped" and IO does work 
(up set is: []p-1 but acting set is previously mapped OSDs). That is 
probably the same state you have seen in your cluster (and I have seen 
in Reef clusters as well). My tests have been performed on a 16.2.11 
test cluster.

At
creation they might stay inactive until a good place for them can be
found, but then you can't write data to it so it is not a "data-loss"
scenario really if the pool never started.

Correct, no data loss in that situation.

Gr. Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx