OK, the OSD is filled again. In and Up, but it is not using the nvme
WAL/DB anymore.
And it looks like the lvm group of the old osd is still on the nvme
drive. I come to this idea, because the two nvme drives still have 9 lvm
groups each. 18 groups but only 17 osd are using the nvme (shown in
dashboard).
Do you have a hint on how to fix this?
Best
Ken
On 30.01.23 16:50, mailing-lists wrote:
oph wait,
i might have been too impatient:
1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06
1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup
dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
Although, it doesnt show the NVME as wal/db yet, but i will let it
proceed to a clear state until i do anything further.
On 30.01.23 16:42, mailing-lists wrote:
root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
destroyed osd.232
OSD 232 shows now as destroyed and out in the dashboard.
root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
--> Zapping: /dev/sdm
--> --destroy was not specified, but zapping a whole device will
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10
conv=fsync
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
--> Zapping successful for: <Raw Device: /dev/sdm>
root@ceph-a2-01:/# ceph orch device ls
ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 21m ago *locked*
It shows locked and is not automatically added now, which is good i
think? otherwise it would probably be a new osd 307.
root@ceph-a2-01:/# ceph orch osd rm status
No OSD remove/replace operations reported
root@ceph-a2-01:/# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']
Unfortunately it is still not replacing.
It is so weird, i tried this procedure exactly in my virtual ceph
environment and it just worked. The real scenario is acting up now. -.-
Do you have more hints for me?
Thank you for your help so far!
Best
Ken
On 30.01.23 15:46, David Orman wrote:
The 'down' status is why it's not being replaced, vs. destroyed,
which would allow the replacement. I'm not sure why --replace lead
to that scenario, but you will probably need to mark it destroyed
for it to be replaced.
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
has instructions on the non-orch way of doing that. You only need 1/2.
You should look through your logs to see what happened that the OSD
was marked down and not destroyed. Obviously, make sure you
understand ramifications before running any commands. :)
David
On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
# ceph orch osd rm status
No OSD remove/replace operations reported
# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']
It is not finding 232 anymore. It is still shown as down and out in
the
Ceph-Dashboard.
pgs: 3236 active+clean
This is the new disk shown as locked (because unzapped at the moment).
# ceph orch device ls
ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 9m ago
locked
Best
Ken
On 29.01.23 18:19, David Orman wrote:
What does "ceph orch osd rm status" show before you try the zap? Is
your cluster still backfilling to the other OSDs for the PGs that
were
on the failed disk?
David
On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
Dear Ceph-Users,
i am struggling to replace a disk. My ceph-cluster is not
replacing the
old OSD even though I did:
ceph orch osd rm 232 --replace
The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.
My steps:
"ceph orch osd rm 232 --replace"
remove the failed hdd.
add the new one.
Convert the disk within the servers bios, so that the node can have
direct access on it.
It shows up as /dev/sdt,
enter maintenance mode
reboot server
drive is now /dev/sdm (which the old drive had)
"ceph orch device zap node-x /dev/sdm"
A new OSD is placed on the cluster.
Can you give me a hint, where did I take a wrong turn? Why is the
disk
not being used as OSD 232?
Best
Ken
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx