Re: ceph-volume lvm batch OSD replacement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

I don't know about keeping the osd-id but I just partially recreated your scenario. I wiped one OSD and recreated it. You are trying to re-use the existing block.db-LV with the device path (--block.db /dev/vg-name/lv-name) instead the lv notation (--block.db vg-name/lv-name):

# ceph-volume lvm create --data /dev/sdq --block.db
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
--osd-id 240

This fails in my test, too. But if I use the LV notation it works:

ceph-2:~ # ceph-volume lvm create --data /dev/sda --block.db ceph-journals/journal-osd3
[...]
Running command: /bin/systemctl enable --runtime ceph-osd@3
Running command: /bin/systemctl start ceph-osd@3
--> ceph-volume lvm activate successful for osd ID: 3
--> ceph-volume lvm create successful for: /dev/sda

This is a Nautilus test cluster, but I remember having this on a Luminous cluster, too. I hope this helps.

Regards,
Eugen


Zitat von Dan van der Ster <dan@xxxxxxxxxxxxxx>:

On Tue, Mar 19, 2019 at 12:25 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>
> On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi all,
> > >
> > > We've just hit our first OSD replacement on a host created with
> > > `ceph-volume lvm batch` with mixed hdds+ssds.
> > >
> > > The hdd /dev/sdq was prepared like this:
> > >    # ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
> > >
> > > Then /dev/sdq failed and was then zapped like this:
> > >   # ceph-volume lvm zap /dev/sdq --destroy
> > >
> > > The zap removed the pv/vg/lv from sdq, but left behind the db on
> > > /dev/sdac (see P.S.)
> >
> > That is correct behavior for the zap command used.
> >
> > >
> > > Now we're replaced /dev/sdq and we're wondering how to proceed. We see
> > > two options:
> > >   1. reuse the existing db lv from osd.240 (Though the osd fsid will
> > > change when we re-create, right?)
> >
> > This is possible but you are right that in the current state, the FSID
> > and other cluster data exist in the LV metadata. To reuse this LV for
> > a new (replaced) OSD
> > then you would need to zap the LV *without* the --destroy flag, which
> > would clear all metadata on the LV and do a wipefs. The command would
> > need the full path to
> > the LV associated with osd.240, something like:
> >
> > ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240
> >
> > >   2. remove the db lv from sdac then run
> > >         # ceph-volume lvm batch /dev/sdq /dev/sdac
> > >      which should do the correct thing.
> >
> > This would also work if the db lv is fully removed with --destroy
> >
> > >
> > > This is all v12.2.11 btw.
> > > If (2) is the prefered approached, then it looks like a bug that the
> > > db lv was not destroyed by lvm zap --destroy.
> >
> > Since /dev/sdq was passed in to zap, just that one device was removed,
> > so this is working as expected.
> >
> > Alternatively, zap has the ability to destroy or zap LVs associated
> > with an OSD ID. I think this is not released yet for Luminous but
> > should be in the next release (which seems to be what you want)
>
> Seems like 12.2.11 was released with the ability to zap by OSD ID. You
> can also zap by OSD FSID, both way will zap (and optionally destroy if
> using --destroy)
> all LVs associated with the OSD.
>
> Full examples on this can be found here:
>
> http://docs.ceph.com/docs/luminous/ceph-volume/lvm/zap/#removing-devices
>
>

Ohh that's an improvement! (Our goal is outsourcing the failure
handling to non-ceph experts, so this will help simplify things.)

In our example, the operator needs to know the osd id, then can do:

1. ceph-volume lvm zap --destroy --osd-id 240 (wipes sdq and removes
the lvm from sdac for osd.240)
2. replace the hdd
3. ceph-volume lvm batch /dev/sdq /dev/sdac --osd-ids 240

But I just remembered that the --osd-ids flag hasn't been backported
to luminous, so we can't yet do that. I guess we'll follow the first
(1) procedure to re-use the existing db lv.

Hmm... re-using the db lv didn't work.

We zapped it (see https://pastebin.com/N6PwpbYu) then got this error
when trying to create:

# ceph-volume lvm create --data /dev/sdq --block.db
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
--osd-id 240
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
9f63b457-37e0-4e33-971e-c0fc24658b65 240
Running command: vgcreate --force --yes
ceph-8ef05e54-8909-49f8-951d-0f9d37aeba45 /dev/sdq
 stdout: Physical volume "/dev/sdq" successfully created.
 stdout: Volume group "ceph-8ef05e54-8909-49f8-951d-0f9d37aeba45"
successfully created
Running command: lvcreate --yes -l 100%FREE -n
osd-block-9f63b457-37e0-4e33-971e-c0fc24658b65
ceph-8ef05e54-8909-49f8-951d-0f9d37aeba45
 stdout: Logical volume
"osd-block-9f63b457-37e0-4e33-971e-c0fc24658b65" created.
--> blkid could not detect a PARTUUID for device:
/dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be destroyed, keeping the ID because it was provided with --osd-id
Running command: ceph osd destroy osd.240 --yes-i-really-mean-it
 stderr: destroyed osd.240
-->  RuntimeError: unable to use device


Any idea?

-- dan




-- dan

> >
> > >
> > > Once we sort this out, we'd be happy to contribute to the ceph-volume
> > > lvm batch doc.
> > >
> > > Thanks!
> > >
> > > Dan
> > >
> > > P.S:
> > >
> > > ===== osd.240 ======
> > >
> > > [ db] /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > >
> > >       type                      db
> > >       osd id                    240
> > >       cluster fsid              b4f463a0-c671-43a8-bd36-e40ab8d233d2
> > >       cluster name              ceph
> > >       osd fsid                  d4d1fb15-a30a-4325-8628-706772ee4294
> > >       db device
> > > /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > >       encrypted                 0
> > >       db uuid                   iWWdyU-UhNu-b58z-ThSp-Bi3B-19iA-06iJIc
> > >       cephx lockbox secret
> > >       block uuid                u4326A-Q8bH-afPb-y7Y6-ftNf-TE1X-vjunBd
> > >       block device
> > > /dev/ceph-f78ff8a3-803d-4b6d-823b-260b301109ac/osd-data-9e4bf34d-1aa3-4c0a-9655-5dba52dcfcd7
> > >       vdo                       0
> > >       crush device class        None
> > >       devices                   /dev/sdac
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux