Re: Overlapping Roots - How to Fix?

Stefan Kooman <stefan@xxxxxx> · Mon, 23 Sep 2024 16:23:10 +0200

On 23-09-2024 16:04, Dave Hall wrote:
Thank you to everybody who has responded to my questions.

At this point I think I am starting to understand.  However, I am still
trying to understand the potential for data loss.

In particular:

    - In some ways it seems that as long as there is sufficient OSD capacity
    available the worst that can happen from a bad crush map is poor placement
    and poor performance. Is this correct?

If you would have a (new) crush rule without any OSD mappings all PGs 
for pools that use that rule would go in an inactive state, i.e. 
downtime. So when you create a (new) rule you would have to check that 
CRUSH can indeed find enough OSDs to comply with the policy you defined.

    - crushtool --compare - if the result of this command shows no
    mismatches, can we say that  the adjusted crush map is safe to apply?

Do you mean there are no differences? Then yes.

    - If all of the 'inhibit flags' are turned on (no out, no down, no
    scrub/deep-scrub, no recover/rebalance/backfill, and perhaps pause) is it
    safe to apply an adjusted crush map? 

It should always be "safe". It just depends on the kind of CRUSHmap you 
are injecting (crush rules without any valid mappings) that might 
"break" things. Ceph will handle that and inform you about it, but you 
might run into downtime when "wrong" CRUSHmap is injected. So you should 
always be careful and test the CRUSHmap beforehand.

 Is it safe to revert to the original
    crush map if things don't seem quite right?

Yes. That's _exactly_ the think you should do when things do not go 
according to plan. So make sure you have a working CRUSHmap (backup) 
that you can fall back to (ceph osd getcrushmap -o  /tmp/crush_raw).

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx