Re: ceph-volume claiming wrong device

Eugen Block <eblock@xxxxxx> · Thu, 27 Oct 2022 07:10:44 +0000

Hi,

first of all, if you really need to issue ceph-volume manually,  
there's a batch command:

cephadm ceph-volume lvm batch /dev/sdb /dev/sdc /dev/sdd /dev/sde

Second, are you using cephadm? Maybe your manual intervention  
conflicts with the automatic osd setup (all available devices). You  
could look into /var/log/ceph/cephadm.log on each node and see if  
cephadm already tried to setup the OSDs for you. What does 'ceph orch  
ls' show?
Did you end up having online OSDs or did it fail? In that case I would  
purge all OSDs from the crushmap, then wipe all devices (ceph-volume  
lvm zap --destroy /dev/sdX) and either let cephadm create the OSDs for  
you or you disable that (unmanaged=true) and run the manual steps  
again (although it's not really necessary).

Regards,
Eugen

Zitat von Oleksiy Stashok <oleksiys@xxxxxxxxxx>:

Hey guys,

I ran into a weird issue, hope you can explain what I'm observing. I'm
testing* Ceph 16.2.10* on *Ubuntu 20.04* in *Google Cloud VMs*, I created 3
instances and attached 4 persistent SSD disks to each instance. I can see
these disks attached as `/dev/sdb, /dev/sdc, /dev/sdd, /dev/sde` devices.

As a next step I used ceph-ansible to bootstrap the ceph cluster on 3
instances, however I intentionally skipped OSD setup. So I ended up with a
Ceph cluster w/o any OSD.

I ssh'ed into each VM and ran:

```
      sudo -s
      for dev in sdb sdc sdd sde; do
        /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore
--dmcrypt --data "/dev/$dev"
      done
```

The operation above randomly fails on random instances/devices with
something like:
```
bluefs _replay 0x0: stop: uuid e2f72ec9-2747-82d7-c7f8-41b7b6d41e1b !=
super.uuid 0110ddb3-d4bf-4c1e-be11-654598c71db0
```

The interesting this is that when I do
```
/usr/sbin/ceph-volume lvm ls
```

I can see that the device for which OSD creation failed actually belongs to
a different OSD that was previously created for a different device. For
example the failure I mentioned above happened on the `/dev/sde` device, so
when I list lvms I see this:
```
====== osd.2 =======

  [block]
/dev/ceph-103a4373-dbe0-43d6-a9e0-34db4e1b257c/osd-block-9af542ba-fd65-4355-ad17-7293856acaeb

      block device

/dev/ceph-103a4373-dbe0-43d6-a9e0-34db4e1b257c/osd-block-9af542ba-fd65-4355-ad17-7293856acaeb
      block uuid                FfFnLt-h33F-F73V-tY45-VuZM-scj7-C3dg1K
      cephx lockbox secret      AQAlelljqNPoMhAA59JwN3wGt0d6Si+nsnxsRQ==
      cluster fsid              348fff8e-e850-4774-9694-05d5414b1c53
      cluster name              ceph
      crush device class
      encrypted                 1
      osd fsid                  9af542ba-fd65-4355-ad17-7293856acaeb
      osd id                    2
      osdspec affinity
      type                      block
      vdo                       0
      devices                   /dev/sdd

  [block]
/dev/ceph-df14969f-2dfb-45f1-a579-a8e23ec12e33/osd-block-4686f6fc-8dc1-48fd-a2d9-70a281c8ee64

      block device

/dev/ceph-df14969f-2dfb-45f1-a579-a8e23ec12e33/osd-block-4686f6fc-8dc1-48fd-a2d9-70a281c8ee64
      block uuid                GEajK3-Tsyf-XZS9-E5ik-M1BB-VIpb-q7D1ET
      cephx lockbox secret      AQAwelljFw2nJBAApuMs2WE0TT+7c1TGa4xQzg==
      cluster fsid              348fff8e-e850-4774-9694-05d5414b1c53
      cluster name              ceph
      crush device class
      encrypted                 1
      osd fsid                  4686f6fc-8dc1-48fd-a2d9-70a281c8ee64
      osd id                    2
      osdspec affinity
      type                      block
      vdo                       0
      devices                   /dev/sde
```

How did it happen that `/dev/sde/ was claimed by osd.2?

Thank you!
Oleksiy
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx