Re: Replacing OSD with containerized deployment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



oph wait,

i might have been too impatient:


1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06

1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}

1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}



Although, it doesnt show the NVME as wal/db yet, but i will let it proceed to a clear state until i do anything further.


On 30.01.23 16:42, mailing-lists wrote:
root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
destroyed osd.232


OSD 232 shows now as destroyed and out in the dashboard.


root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
--> Zapping: /dev/sdm
--> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
 stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
--> Zapping successful for: <Raw Device: /dev/sdm>


root@ceph-a2-01:/# ceph orch device ls

ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T             21m ago *locked*


It shows locked and is not automatically added now, which is good i think? otherwise it would probably be a new osd 307.


root@ceph-a2-01:/# ceph orch osd rm status
No OSD remove/replace operations reported

root@ceph-a2-01:/# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']


Unfortunately it is still not replacing.


It is so weird, i tried this procedure exactly in my virtual ceph environment and it just worked. The real scenario is acting up now. -.-


Do you have more hints for me?

Thank you for your help so far!


Best

Ken


On 30.01.23 15:46, David Orman wrote:
The 'down' status is why it's not being replaced, vs. destroyed, which would allow the replacement. I'm not sure why --replace lead to that scenario, but you will probably need to mark it destroyed for it to be replaced.

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd has instructions on the non-orch way of doing that. You only need 1/2.

You should look through your logs to see what happened that the OSD was marked down and not destroyed. Obviously, make sure you understand ramifications before running any commands. :)

David

On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
# ceph orch osd rm status
No OSD remove/replace operations reported
# ceph orch osd rm 232 --replace
Unable to find OSDs: ['232']

It is not finding 232 anymore. It is still shown as down and out in the
Ceph-Dashboard.


      pgs:     3236 active+clean


This is the new disk shown as locked (because unzapped at the moment).

# ceph orch device ls

ceph-a1-06  /dev/sdm      hdd   TOSHIBA_X_X 16.0T 9m ago
locked


Best

Ken


On 29.01.23 18:19, David Orman wrote:
What does "ceph orch osd rm status" show before you try the zap? Is
your cluster still backfilling to the other OSDs for the PGs that were
on the failed disk?

David

On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
Dear Ceph-Users,

i am struggling to replace a disk. My ceph-cluster is not replacing the
old OSD even though I did:

ceph orch osd rm 232 --replace

The OSD 232 is still shown in the osd list, but the new hdd will be
placed as a new OSD. This wouldnt mind me much, if the OSD was also
placed on the bluestoreDB / NVME, but it doesn't.


My steps:

"ceph orch osd rm 232 --replace"

remove the failed hdd.

add the new one.

Convert the disk within the servers bios, so that the node can have
direct access on it.

It shows up as /dev/sdt,

enter maintenance mode

reboot server

drive is now /dev/sdm (which the old drive had)

"ceph orch device zap node-x /dev/sdm"

A new OSD is placed on the cluster.


Can you give me a hint, where did I take a wrong turn? Why is the disk
not being used as OSD 232?


Best

Ken

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux