I may have misread your original email, for which I apologize. If you do a 'ceph orch device ls' does the NVME in question show available? On that host with the failed OSD, if you lvs/lsblk do you see the old DB on the NVME still? I'm not sure if the replacement process you followed will work. Here's what we do on OSD pre-failure as well as failures on nodes with NVME backing the OSD for DB/WAL: In cephadm shell, on host with drive to replace (in this example, let's say 391 on a node called ceph15): # capture "db device" and raw device associated with OSD (just for safety) ceph-volume lvm list | less # drain drive if possible, do this when planning replacement, otherwise do once failure has occurred ceph orch osd rm 391 --replace # One drained (or if failure occurred) (we don't use the orch version yet because we've had issues with it) ceph-volume lvm zap --osd-id 391 --destroy # refresh devices ceph orch device ls --refresh # monitor ceph for replacement ceph -W cephadm # once daemon has been deployed "2021-03-25T18:03:16.742483+0000 mgr.ceph02.duoetc [INF] Deploying daemon osd.391 on ceph15", watch for rebalance to complete ceph -s # consider increasing max_backfills if it's just a single drive replacement: ceph config set osd osd_max_backfills 10 # if you do, after backfilling is complete (validate with 'ceph -s'): ceph config rm osd osd_max_backfills The lvm zap cleans up the db/wal LV, which allows for the replacement drive to rebuild with db/wal on the NVME. Hope this helps, David On Fri, Aug 27, 2021 at 7:21 PM Eric Fahnle <efahnle@xxxxxxxxxxx> wrote: > > Hi David! Very much appreciated your response. > > I'm not sure that may be the problem. I tried with the following (without using "rotational"): > > ...(snip)... > data_devices: > size: "15G:" > db_devices: > size: ":15G" > filter_logic: AND > placement: > label: "osdj2" > service_id: test_db_device > service_type: osd > ...(snip)... > > Without success. Also tried without the "filter_logic: AND" in the yaml file and the result was the same. > > Best regards, > Eric > > > -----Original Message----- > From: David Orman [mailto:ormandj@xxxxxxxxxxxx] > Sent: 27 August 2021 14:56 > To: Eric Fahnle > Cc: ceph-users@xxxxxxx > Subject: Re: Missing OSD in SSD after disk failure > > This was a bug in some versions of ceph, which has been fixed: > > https://tracker.ceph.com/issues/49014 > https://github.com/ceph/ceph/pull/39083 > > You'll want to upgrade Ceph to resolve this behavior, or you can use size or something else to filter if that is not possible. > > David > > On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle <efahnle@xxxxxxxxxxx> wrote: > > > > Hi everyone! > > I've got a doubt, I tried searching for it in this list, but didn't find an answer. > > > > I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The deployment was done with "ceph orch apply deploy-osd.yaml", in which the file "deploy-osd.yaml" contained the following: > > --- > > service_type: osd > > service_id: default_drive_group > > placement: > > label: "osd" > > data_devices: > > rotational: 1 > > db_devices: > > rotational: 0 > > > > After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, plus the DB. > > > > A few days ago, an HDD broke and got replaced. Ceph detected the new disk and created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in that server has 3 OSDs running but didn't add the new one. I couldn't find out how to re-create the OSD with the exact configuration it had before. The only "way" I found was to delete all 4 OSDs and create everything from scratch (I didn't actually do it, as I hope there is a better way). > > > > Has anyone had this issue before? I'd be glad if someone pointed me in the right direction. > > > > Currently running: > > Version > > 15.2.8 > > octopus (stable) > > > > Thank you in advance and best regards, Eric > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx