I have a feeling that this could be related to the drives, but I have
no real proof. I drained the SSD OSDs yesterday, hours later I wanted
to remove the OSDs (no PGs were on them anymore) via for loop with the
orchestrator (ceph orch osd rm ID --force --zap). The first one got
removed quite quickly, but the others disappeared from the queue. I
tried again and looked at the iostat output on the node where the
queued OSD was running on. The drive had no IO at all but was utilized
100% for several minutes until it was eventually removed.
I find that very weird, especially since we’re currently helping a
customer rebuild OSDs on Pacific as well, and I haven’t seen such a
behavior yet. And we already redeployed 132 OSDs so far.
Zitat von Eugen Block <eblock@xxxxxx>:
Hi,
I'm not sure if this has been asked before, or if there's an
existing tracker issue already. It's difficult to reproduce it on my
lab clusters.
I'm testing some new SSD OSDs on a Pacific cluster (16.2.15) and
noticed that if we instruct the orchestrator to remove two or three
OSDs (issuing the command 'ceph orch osd rm {ID}' a couple of
times), it eventually only removes the first in the queue. I've been
watching 'ceph orch osd rm status' to see the progress, and then the
rest of the queued OSDs suddenly vanish from the status and never
get removed. Then I have to issue the command again. If I remove one
OSD by one, so no others in the queue, they are all successfully
removed. Why is this happening? Is this a bug in Pacific? Has
someone seen this in newer releases?
Thanks!
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx