Re: Missing OSD in SSD after disk failure

Eric Fahnle <efahnle@xxxxxxxxxxx> · Tue, 31 Aug 2021 18:10:02 +0000

Hi David, no problem, thanks for your help!

Went through your commands, here are the results

-4 Servers with OSDs
-Server "nubceph04" has 2 osd (osd.0 and osd.7 in /dev/sdb and /dev/sdc respectively, and db_device in /dev/sdd)

# capture "db device" and raw device associated with OSD (just for safety)
"ceph-volume lvm list" shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
  [block]       /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
      db device                 /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      osd id                    0
      devices                   /dev/sdb
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      block device              /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
      db device                 /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      osd id                    0
      devices                   /dev/sdd
====== osd.7 =======
  [block]       /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
      block device              /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
      osd id                    7
      devices                   /dev/sdc
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
      block device              /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
      db device                 /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
      osd id                    7
      devices                   /dev/sdd

# drain drive if possible, do this when planning replacement, otherwise do once failure has occurred
#Try to remove the osd.7

ceph orch osd rm 7 --replace
Scheduled OSD(s) for removal

Waited until it finished rebalancing, monitoring with:
ceph -W cephadm
2021-08-30T18:05:32.280716-0300 mgr.nubvm02.viqmmr [INF] OSD <7> is not empty yet. Waiting a bit more
2021-08-30T18:06:03.374424-0300 mgr.nubvm02.viqmmr [INF] OSDs <[<OSD>(osd_id=7, is_draining=False)]> are now <down>

# One drained (or if failure occurred) (we don't use the orch version
#yet because we've had issues with it)
ceph-volume lvm zap --osd-id 7 --destroy
--> Zapping: /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0853454 s, 123 MB/s
--> More than 1 LV left in VG, will proceed to destroy LV only
--> Removing LV because --destroy was given: /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
Running command: /usr/sbin/lvremove -v -f /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3
 stdout: Logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3" successfully removed
 stderr: Removing ceph--block--dbs--08ee3a44--8503--40dd--9bdd--ed9a8f674a54-osd--block--db--fd2bd125--3f22--40f1--8524--744a100236f3 (253:3)
 stderr: Archiving volume group "ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" metadata (seqno 9).
 stderr: Releasing logical volume "osd-block-db-fd2bd125-3f22-40f1-8524-744a100236f3"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54" (seqno 10).
--> Zapping: /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.054587 s, 192 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3
Running command: /usr/sbin/vgremove -v -f ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3
 stderr: Removing ceph--block--c3d30e81--ff7d--4007--9ad4--c16f852466a3-osd--block--42278e28--5274--4167--a014--6a6a956110ad (253:2)
 stderr: Archiving volume group "ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" metadata (seqno 5).
  Releasing logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" (seqno 6).
 stdout: Logical volume "osd-block-42278e28-5274-4167-a014-6a6a956110ad" successfully removed
 stderr: Removing physical volume "/dev/sdc" from volume group "ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3"
 stdout: Volume group "ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3" successfully removed
--> Zapping successful for OSD: 7

after that, the command:

ceph-volume lvm list

shows on osd.0 the same as above, but nothing about osd.7

# refresh devices
ceph orch device ls --refresh

HOST       PATH      TYPE   SIZE  DEVICE_ID  MODEL         VENDOR  ROTATIONAL  AVAIL  REJECT REASONS
nubceph04  /dev/sda  hdd   19.0G             Virtual disk  VMware  1           False  locked
nubceph04  /dev/sdb  hdd   20.0G             Virtual disk  VMware  1           False  locked, Insufficient space (<5GB) on vgs, LVM detected
nubceph04  /dev/sdc  hdd   20.0G             Virtual disk  VMware  1           True
nubceph04  /dev/sdd  hdd   10.0G             Virtual disk  VMware  1           False  locked, LVM detected

After some time, recreates osd.7 but without db_device

# monitor ceph for replacement
ceph -W cephadm
..
2021-08-30T18:11:22.439190-0300 mgr.nubvm02.viqmmr [INF] Deploying daemon osd.7 on nubceph04
..

Wait until it finishes rebalancing. If I run again:

ceph-volume lvm list

shows for each osd, which disks and lvs are in use (snipped):
====== osd.0 =======
  [block]       /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
      db device                 /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      osd id                    0
      devices                   /dev/sdb
  [db]          /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      block device              /dev/ceph-block-b301ec31-5779-4834-9fb7-e45afa45f803/osd-block-79d89e54-4a4b-4e89-aea3-72fa6aa343a5
      db device                 /dev/ceph-block-dbs-08ee3a44-8503-40dd-9bdd-ed9a8f674a54/osd-block-db-e7771b96-7a1d-43b2-a7d8-9204ef158224
      osd id                    0
      devices                   /dev/sdd
====== osd.7 =======
  [block]       /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
      block device              /dev/ceph-block-c3d30e81-ff7d-4007-9ad4-c16f852466a3/osd-block-42278e28-5274-4167-a014-6a6a956110ad
      osd id                    7
      devices                   /dev/sdc

It seems it didn't create the lv for "ceph-block-dbs" as it had before

If I run everything again but with osd.0, it creates correctly, because when running:

ceph-volume lvm zap --osd-id 0 --destroy

It doesn't say this line:

--> More than 1 LV left in VG, will proceed to destroy LV only

But it rather says this:

--> Only 1 LV left in VG, will proceed to destroy volume group

As far as I can tell, if the disk is not empty it just doesn't use it. Let me know if I wasn't clear enough.

Best regards,
Eric
________________________________
From: David Orman <ormandj@xxxxxxxxxxxx>
Sent: Monday, August 30, 2021 1:14 PM
To: Eric Fahnle <efahnle@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: Re:  Missing OSD in SSD after disk failure

I may have misread your original email, for which I apologize. If you
do a 'ceph orch device ls' does the NVME in question show available?
On that host with the failed OSD, if you lvs/lsblk do you see the old
DB on the NVME still? I'm not sure if the replacement process you
followed will work. Here's what we do on OSD pre-failure as well as
failures on nodes with NVME backing the OSD for DB/WAL:

In cephadm shell, on host with drive to replace (in this example,
let's say 391 on a node called ceph15):

# capture "db device" and raw device associated with OSD (just for safety)
ceph-volume lvm list | less

# drain drive if possible, do this when planning replacement,
otherwise do once failure has occurred
ceph orch osd rm 391 --replace

# One drained (or if failure occurred) (we don't use the orch version
yet because we've had issues with it)
ceph-volume lvm zap --osd-id 391 --destroy

# refresh devices
ceph orch device ls --refresh

# monitor ceph for replacement
ceph -W cephadm

# once daemon has been deployed "2021-03-25T18:03:16.742483+0000
mgr.ceph02.duoetc [INF] Deploying daemon osd.391 on ceph15", watch for
rebalance to complete
ceph -s

# consider increasing max_backfills if it's just a single drive replacement:
ceph config set osd osd_max_backfills 10

# if you do, after backfilling is complete (validate with 'ceph -s'):
ceph config rm osd osd_max_backfills

The lvm zap cleans up the db/wal LV, which allows for the replacement
drive to rebuild with db/wal on the NVME.

Hope this helps,
David

On Fri, Aug 27, 2021 at 7:21 PM Eric Fahnle <efahnle@xxxxxxxxxxx> wrote:
>
> Hi David! Very much appreciated your response.
>
> I'm not sure that may be the problem. I tried with the following (without using "rotational"):
>
> ...(snip)...
> data_devices:
>    size: "15G:"
> db_devices:
>    size: ":15G"
> filter_logic: AND
> placement:
>   label: "osdj2"
> service_id: test_db_device
> service_type: osd
> ...(snip)...
>
> Without success. Also tried without the "filter_logic: AND" in the yaml file and the result was the same.
>
> Best regards,
> Eric
>
>
> -----Original Message-----
> From: David Orman [mailto:ormandj@xxxxxxxxxxxx]
> Sent: 27 August 2021 14:56
> To: Eric Fahnle
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Missing OSD in SSD after disk failure
>
> This was a bug in some versions of ceph, which has been fixed:
>
> https://tracker.ceph.com/issues/49014
> https://github.com/ceph/ceph/pull/39083
>
> You'll want to upgrade Ceph to resolve this behavior, or you can use size or something else to filter if that is not possible.
>
> David
>
> On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle <efahnle@xxxxxxxxxxx> wrote:
> >
> > Hi everyone!
> > I've got a doubt, I tried searching for it in this list, but didn't find an answer.
> >
> > I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The deployment was done with "ceph orch apply deploy-osd.yaml", in which the file "deploy-osd.yaml" contained the following:
> > ---
> > service_type: osd
> > service_id: default_drive_group
> > placement:
> >   label: "osd"
> > data_devices:
> >   rotational: 1
> > db_devices:
> >   rotational: 0
> >
> > After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, plus the DB.
> >
> > A few days ago, an HDD broke and got replaced. Ceph detected the new disk and created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in that server has 3 OSDs running but didn't add the new one. I couldn't find out how to re-create the OSD with the exact configuration it had before. The only "way" I found was to delete all 4 OSDs and create everything from scratch (I didn't actually do it, as I hope there is a better way).
> >
> > Has anyone had this issue before? I'd be glad if someone pointed me in the right direction.
> >
> > Currently running:
> > Version
> > 15.2.8
> > octopus (stable)
> >
> > Thank you in advance and best regards, Eric
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx