Hey Eugen, valid points, I first tried to provision OSDs via ceph-ansible (later excluded), which does run the batch command with all 4 disk devices, but it often failed with the same issue I mentioned earlier, something like: ``` bluefs _replay 0x0: stop: uuid e2f72ec9-2747-82d7-c7f8-41b7b6d41e1b != super.uuid 0110ddb3-d4bf-4c1e-be11-654598c71db0 ``` that's why I abandoned that idea and tried to provision OSDs manually one by one. As I mentioned I used ceph-ansible, not cephadm for legacy reasons, but I suspect the problem I'm seeing is related to ceph-volume, so I suspect cephadm won't change it. I did more investigation in 1-by-1 OSD creation flow and it seems like the fact that `ceph-volume lvm list` shows me 2 devices belonging to the same OSD can be explained by the following flow: 1. ceph-volume lvm create --bluestore --dmcrypt --data /dev/sdd 2. trying to create osd.2 3. fails with uuid != super.uuid issue 4. ceph-volume lvm list returns /dev/sdd belong to osd.2 (even though it failed) 5. ceph-volume lvm create --bluestore --dmcrypt --data /dev/sde 6. trying to create osd.2 (*again*) 7. succeeds 8. ceph-volume lvm list returns both /dev/sdd and /dev/sde belonging to osd.2 osd.2 is reported to be up and running. Any idea why this is happening? Thank you! Oleksiy On Thu, Oct 27, 2022 at 12:11 AM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > first of all, if you really need to issue ceph-volume manually, > there's a batch command: > > cephadm ceph-volume lvm batch /dev/sdb /dev/sdc /dev/sdd /dev/sde > > Second, are you using cephadm? Maybe your manual intervention > conflicts with the automatic osd setup (all available devices). You > could look into /var/log/ceph/cephadm.log on each node and see if > cephadm already tried to setup the OSDs for you. What does 'ceph orch > ls' show? > Did you end up having online OSDs or did it fail? In that case I would > purge all OSDs from the crushmap, then wipe all devices (ceph-volume > lvm zap --destroy /dev/sdX) and either let cephadm create the OSDs for > you or you disable that (unmanaged=true) and run the manual steps > again (although it's not really necessary). > > Regards, > Eugen > > Zitat von Oleksiy Stashok <oleksiys@xxxxxxxxxx>: > > > Hey guys, > > > > I ran into a weird issue, hope you can explain what I'm observing. I'm > > testing* Ceph 16.2.10* on *Ubuntu 20.04* in *Google Cloud VMs*, I > created 3 > > instances and attached 4 persistent SSD disks to each instance. I can see > > these disks attached as `/dev/sdb, /dev/sdc, /dev/sdd, /dev/sde` devices. > > > > As a next step I used ceph-ansible to bootstrap the ceph cluster on 3 > > instances, however I intentionally skipped OSD setup. So I ended up with > a > > Ceph cluster w/o any OSD. > > > > I ssh'ed into each VM and ran: > > > > ``` > > sudo -s > > for dev in sdb sdc sdd sde; do > > /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore > > --dmcrypt --data "/dev/$dev" > > done > > ``` > > > > The operation above randomly fails on random instances/devices with > > something like: > > ``` > > bluefs _replay 0x0: stop: uuid e2f72ec9-2747-82d7-c7f8-41b7b6d41e1b != > > super.uuid 0110ddb3-d4bf-4c1e-be11-654598c71db0 > > ``` > > > > The interesting this is that when I do > > ``` > > /usr/sbin/ceph-volume lvm ls > > ``` > > > > I can see that the device for which OSD creation failed actually belongs > to > > a different OSD that was previously created for a different device. For > > example the failure I mentioned above happened on the `/dev/sde` device, > so > > when I list lvms I see this: > > ``` > > ====== osd.2 ======= > > > > [block] > > > /dev/ceph-103a4373-dbe0-43d6-a9e0-34db4e1b257c/osd-block-9af542ba-fd65-4355-ad17-7293856acaeb > > > > block device > > > > > /dev/ceph-103a4373-dbe0-43d6-a9e0-34db4e1b257c/osd-block-9af542ba-fd65-4355-ad17-7293856acaeb > > block uuid FfFnLt-h33F-F73V-tY45-VuZM-scj7-C3dg1K > > cephx lockbox secret AQAlelljqNPoMhAA59JwN3wGt0d6Si+nsnxsRQ== > > cluster fsid 348fff8e-e850-4774-9694-05d5414b1c53 > > cluster name ceph > > crush device class > > encrypted 1 > > osd fsid 9af542ba-fd65-4355-ad17-7293856acaeb > > osd id 2 > > osdspec affinity > > type block > > vdo 0 > > devices /dev/sdd > > > > [block] > > > /dev/ceph-df14969f-2dfb-45f1-a579-a8e23ec12e33/osd-block-4686f6fc-8dc1-48fd-a2d9-70a281c8ee64 > > > > block device > > > > > /dev/ceph-df14969f-2dfb-45f1-a579-a8e23ec12e33/osd-block-4686f6fc-8dc1-48fd-a2d9-70a281c8ee64 > > block uuid GEajK3-Tsyf-XZS9-E5ik-M1BB-VIpb-q7D1ET > > cephx lockbox secret AQAwelljFw2nJBAApuMs2WE0TT+7c1TGa4xQzg== > > cluster fsid 348fff8e-e850-4774-9694-05d5414b1c53 > > cluster name ceph > > crush device class > > encrypted 1 > > osd fsid 4686f6fc-8dc1-48fd-a2d9-70a281c8ee64 > > osd id 2 > > osdspec affinity > > type block > > vdo 0 > > devices /dev/sde > > ``` > > > > How did it happen that `/dev/sde/ was claimed by osd.2? > > > > Thank you! > > Oleksiy > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx