Re: ceph-volume lvm batch OSD replacement

Jan Fajerski <jfajerski@xxxxxxxx> · Tue, 19 Mar 2019 16:31:20 +0100

On Tue, Mar 19, 2019 at 02:17:56PM +0100, Dan van der Ster wrote:
On Tue, Mar 19, 2019 at 1:05 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:

On Tue, Mar 19, 2019 at 7:26 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 19, 2019 at 12:17 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 19, 2019 at 7:00 AM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 19, 2019 at 6:47 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > We've just hit our first OSD replacement on a host created with
> > > > `ceph-volume lvm batch` with mixed hdds+ssds.
> > > >
> > > > The hdd /dev/sdq was prepared like this:
> > > >    # ceph-volume lvm batch /dev/sd[m-r] /dev/sdac --yes
> > > >
> > > > Then /dev/sdq failed and was then zapped like this:
> > > >   # ceph-volume lvm zap /dev/sdq --destroy
> > > >
> > > > The zap removed the pv/vg/lv from sdq, but left behind the db on
> > > > /dev/sdac (see P.S.)
> > >
> > > That is correct behavior for the zap command used.
> > >
> > > >
> > > > Now we're replaced /dev/sdq and we're wondering how to proceed. We see
> > > > two options:
> > > >   1. reuse the existing db lv from osd.240 (Though the osd fsid will
> > > > change when we re-create, right?)
> > >
> > > This is possible but you are right that in the current state, the FSID
> > > and other cluster data exist in the LV metadata. To reuse this LV for
> > > a new (replaced) OSD
> > > then you would need to zap the LV *without* the --destroy flag, which
> > > would clear all metadata on the LV and do a wipefs. The command would
> > > need the full path to
> > > the LV associated with osd.240, something like:
> > >
> > > ceph-volume lvm zap /dev/ceph-osd-lvs/db-lv-240
> > >
> > > >   2. remove the db lv from sdac then run
> > > >         # ceph-volume lvm batch /dev/sdq /dev/sdac
> > > >      which should do the correct thing.
> > >
> > > This would also work if the db lv is fully removed with --destroy
> > >
> > > >
> > > > This is all v12.2.11 btw.
> > > > If (2) is the prefered approached, then it looks like a bug that the
> > > > db lv was not destroyed by lvm zap --destroy.
> > >
> > > Since /dev/sdq was passed in to zap, just that one device was removed,
> > > so this is working as expected.
> > >
> > > Alternatively, zap has the ability to destroy or zap LVs associated
> > > with an OSD ID. I think this is not released yet for Luminous but
> > > should be in the next release (which seems to be what you want)
> >
> > Seems like 12.2.11 was released with the ability to zap by OSD ID. You
> > can also zap by OSD FSID, both way will zap (and optionally destroy if
> > using --destroy)
> > all LVs associated with the OSD.
> >
> > Full examples on this can be found here:
> >
> > http://docs.ceph.com/docs/luminous/ceph-volume/lvm/zap/#removing-devices
> >
> >
>
> Ohh that's an improvement! (Our goal is outsourcing the failure
> handling to non-ceph experts, so this will help simplify things.)
>
> In our example, the operator needs to know the osd id, then can do:
>
> 1. ceph-volume lvm zap --destroy --osd-id 240 (wipes sdq and removes
> the lvm from sdac for osd.240)
> 2. replace the hdd
> 3. ceph-volume lvm batch /dev/sdq /dev/sdac --osd-ids 240
>
> But I just remembered that the --osd-ids flag hasn't been backported
> to luminous, so we can't yet do that. I guess we'll follow the first
> (1) procedure to re-use the existing db lv.

It has! (I initially thought it wasn't). Check if `ceph-volume lvm zap
--help` has the flags available, I think they should appear for
12.2.11

Is it there? Indeed I see zap --osd-id, but for the recreation I'm
referring to batch --osd-ids, which afaict is only in nautilus:

https://github.com/ceph/ceph/blob/nautilus/src/ceph-volume/ceph_volume/devices/lvm/batch.py#L248
Right, this PR was not backported yet https://github.com/ceph/ceph/pull/25542
I'll get on that.

We'll probably need to look at how c-v is developed now that nautilus is out.  
Maintaining three branches (luminous, mimic, nautlius) nad more in the future 
with essentially the same code make no sense and adds plenty of unnecessary 
work.

-- dan

>
> -- dan
>
> > >
> > > >
> > > > Once we sort this out, we'd be happy to contribute to the ceph-volume
> > > > lvm batch doc.
> > > >
> > > > Thanks!
> > > >
> > > > Dan
> > > >
> > > > P.S:
> > > >
> > > > ===== osd.240 ======
> > > >
> > > >   [  db]    /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > > >
> > > >       type                      db
> > > >       osd id                    240
> > > >       cluster fsid              b4f463a0-c671-43a8-bd36-e40ab8d233d2
> > > >       cluster name              ceph
> > > >       osd fsid                  d4d1fb15-a30a-4325-8628-706772ee4294
> > > >       db device
> > > > /dev/ceph-094c06db-98dc-47f6-a7e5-1092b099b372/osd-block-db-fa0e7927-dc3e-44d0-a8ce-1d8202fa75dd
> > > >       encrypted                 0
> > > >       db uuid                   iWWdyU-UhNu-b58z-ThSp-Bi3B-19iA-06iJIc
> > > >       cephx lockbox secret
> > > >       block uuid                u4326A-Q8bH-afPb-y7Y6-ftNf-TE1X-vjunBd
> > > >       block device
> > > > /dev/ceph-f78ff8a3-803d-4b6d-823b-260b301109ac/osd-data-9e4bf34d-1aa3-4c0a-9655-5dba52dcfcd7
> > > >       vdo                       0
> > > >       crush device class        None
> > > >       devices                   /dev/sdac
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com