Re: Migrating filestore to bluestore using ceph-volume

Alfredo Deza <adeza@xxxxxxxxxx> · Mon, 29 Jan 2018 07:39:05 -0500

On Fri, Jan 26, 2018 at 5:00 PM, Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
> Bit late for this to be helpful, but instead of zapping the lvm labels, you
> could alternatively destroy the lvm volume by hand.
>
> lvremove -f <volume_group>/<logical_volume>
> vgremove <volume_group>
> pvremove /dev/ceph-device (should wipe labels)
>
>
> Then you should be able to run ‘ceph-volume lvm zap /dev/sdX’ and retry the
> 'ceph-volume lvm create’ command (sans --osd-id flag) and it should run as
> well.
>
> This info will hopefully be useful for those not as well versed with lvm as
> I am/was at the time I needed this info.
>
> Reed
>
> On Jan 26, 2018, at 11:32 AM, David Majchrzak <david@xxxxxxxxxx> wrote:
>
> Thanks that helped!
>
> Since I had already "halfway" created a lvm volume I wanted to start from
> the beginning and zap it.
>
> Tried to zap the raw device but failed since --destroy doesn't seem to be in
> 12.2.2
>
> http://docs.ceph.com/docs/master/ceph-volume/lvm/zap/
>
> root@int1:~# ceph-volume lvm zap /dev/sdc --destroy
> usage: ceph-volume lvm zap [-h] [DEVICE]
> ceph-volume lvm zap: error: unrecognized arguments: --destroy
>
> So i zapped it with the vg/lvm instead.
> ceph-volume lvm zap
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
>
> However I run create on it since the LVM was already there.
> So I zapped it with sgdisk and ran dmsetup remove. After that I was able to
> create it again.
>
> However - each "ceph-volume lvm create" that I ran that failed, successfully
> added an osd to crush map ;)

You are hitting a few known issues here, and some missing features
that are already in master (planned for Mimic) but not in Luminous.

* --osd-id cannot be used currently, just make sure the OSD is
destroyed so that ceph-volume can pick the next ID available (Not
fixed yet: http://tracker.ceph.com/issues/22642)
* When creating an OSD and this fails, the ID is created and leftover
(Fixed with: http://tracker.ceph.com/issues/22704)
* --destroy helps with full removal of logical volumes and their
groups when zapping (fixed, only in master
http://tracker.ceph.com/issues/22653)

To recap, borrowing from all the good suggestions here:

* Don't use --osd-id, just let ceph-volume grab the next one available
* Ensure that the ID is fully removed, including the auth
* If deploying the OSD fails, repeat the manual OSD removal, and
remove the vg/lv by hand with: `sudo vgremove <vg>` (this will remove
the lv and pv associated with it)

>
> So I've got this now:
>
> root@int1:~# ceph osd df tree
> ID CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL  %USE  VAR  PGS TYPE NAME
> -1       2.60959        - 2672G  1101G  1570G 41.24 1.00   - root default
> -2       0.87320        -  894G   369G   524G 41.36 1.00   -     host int1
>  3   ssd 0.43660  1.00000  447G   358G 90295M 80.27 1.95 301         osd.3
>  8   ssd 0.43660  1.00000  447G 11273M   436G  2.46 0.06  19         osd.8
> -3       0.86819        -  888G   366G   522G 41.26 1.00   -     host int2
>  1   ssd 0.43159  1.00000  441G   167G   274G 37.95 0.92 147         osd.1
>  4   ssd 0.43660  1.00000  447G   199G   247G 44.54 1.08 173         osd.4
> -4       0.86819        -  888G   365G   523G 41.09 1.00   -     host int3
>  2   ssd 0.43159  1.00000  441G   193G   248G 43.71 1.06 174         osd.2
>  5   ssd 0.43660  1.00000  447G   172G   274G 38.51 0.93 146         osd.5
>  0             0        0     0      0      0     0    0   0 osd.0
>  6             0        0     0      0      0     0    0   0 osd.6
>  7             0        0     0      0      0     0    0   0 osd.7
>
> I guess I can just remove them from crush,auth and rm them?
>
> Kind Regards,
>
> David Majchrzak
>
> 26 jan. 2018 kl. 18:09 skrev Reed Dier <reed.dier@xxxxxxxxxxx>:
>
> This is the exact issue that I ran into when starting my bluestore
> conversion journey.
>
> See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html
>
> Specifying --osd-id causes it to fail.
>
> Below are my steps for OSD replace/migrate from filestore to bluestore.
>
> BIG caveat here in that I am doing destructive replacement, in that I am not
> allowing my objects to be migrated off of the OSD I’m replacing before
> nuking it.
> With 8TB drives it just takes way too long, and I trust my failure domains
> and other hardware to get me through the backfills.
> So instead of 1) reading data off, writing data elsewhere 2) remove/re-add
> 3) reading data elsewhere, writing back on, I am taking step one out, and
> trusting my two other copies of the objects. Just wanted to clarify my
> steps.
>
> I also set norecover and norebalance flags immediately prior to running
> these commands so that it doesn’t try to start moving data unnecessarily.
> Then when done, remove those flags, and let it backfill.
>
> systemctl stop ceph-osd@$ID.service
> ceph-osd -i $ID --flush-journal
> umount /var/lib/ceph/osd/ceph-$ID
> ceph-volume lvm zap /dev/$ID
> ceph osd crush remove osd.$ID
> ceph auth del osd.$ID
> ceph osd rm osd.$ID
> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME
>
>
> So essentially I fully remove the OSD from crush and the osdmap, and when I
> add the OSD back, like I would a new OSD, it fills in the numeric gap with
> the $ID it had before.
>
> Hope this is helpful.
> Been working well for me so far, doing 3 OSDs at a time (half of a failure
> domain).
>
> Reed
>
> On Jan 26, 2018, at 10:01 AM, David <david@xxxxxxxxxx> wrote:
>
>
> Hi!
>
> On luminous 12.2.2
>
> I'm migrating some OSDs from filestore to bluestore using the "simple"
> method as described in docs:
> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds
> Mark out and Replace.
>
> However, at 9.: ceph-volume create --bluestore --data $DEVICE --osd-id $ID
> it seems to create the bluestore but it fails to authenticate with the old
> osd-id auth.
> (the command above is also missing lvm or simple)
>
> I think it's related to this:
> http://tracker.ceph.com/issues/22642
>
> # ceph-volume lvm create --bluestore --data /dev/sdc --osd-id 0
> Running command: sudo vgcreate --force --yes
> ceph-efad7df8-721d-43d8-8d02-449406e70b90 /dev/sdc
>  stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!
>  stdout: Physical volume "/dev/sdc" successfully created
>  stdout: Volume group "ceph-efad7df8-721d-43d8-8d02-449406e70b90"
> successfully created
> Running command: sudo lvcreate --yes -l 100%FREE -n
> osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
> ceph-efad7df8-721d-43d8-8d02-449406e70b90
>  stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before
> enabling it!
>  stdout: Logical volume "osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9"
> created.
> Running command: sudo mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
> Running command: chown -R ceph:ceph /dev/dm-4
> Running command: sudo ln -s
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
> /var/lib/ceph/osd/ceph-0/block
> Running command: sudo ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> /var/lib/ceph/osd/ceph-0/activate.monmap
>  stderr: got monmap epoch 2
> Running command: ceph-authtool /var/lib/ceph/osd/ceph-0/keyring
> --create-keyring --name osd.0 --add-key XXXXXXXX
>  stdout: creating /var/lib/ceph/osd/ceph-0/keyring
>  stdout: added entity osd.0 auth auth(auid = 18446744073709551615 key=
> XXXXXXXX with 0 caps)
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
> Running command: sudo ceph-osd --cluster ceph --osd-objectstore bluestore
> --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --key
> **************************************** --osd-data
> /var/lib/ceph/osd/ceph-0/ --osd-uuid 138ce507-f28a-45bf-814c-7fa124a9d9b9
> --setuser ceph --setgroup ceph
>  stderr: 2018-01-26 14:59:10.039549 7fd7ef951cc0 -1
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode
> label at offset 102: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2018-01-26 14:59:10.039744 7fd7ef951cc0 -1
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode
> label at offset 102: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2018-01-26 14:59:10.039925 7fd7ef951cc0 -1
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode
> label at offset 102: buffer::malformed_input: void
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past
> end of struct encoding
>  stderr: 2018-01-26 14:59:10.039984 7fd7ef951cc0 -1
> bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
>  stderr: 2018-01-26 14:59:11.359951 7fd7ef951cc0 -1 key XXXXXXXX
>  stderr: 2018-01-26 14:59:11.888476 7fd7ef951cc0 -1 created object store
> /var/lib/ceph/osd/ceph-0/ for osd.0 fsid
> efad7df8-721d-43d8-8d02-449406e70b90
> Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
> --path /var/lib/ceph/osd/ceph-0
> Running command: sudo ln -snf
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
> /var/lib/ceph/osd/ceph-0/block
> Running command: chown -R ceph:ceph /dev/dm-4
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
> Running command: sudo systemctl enable
> ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9
>  stderr: Created symlink from
> /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9.service
> to /lib/systemd/system/ceph-volume@.service.
> Running command: sudo systemctl start ceph-osd@0
>
> ceph-osd.0.log shows:
>
> 2018-01-26 15:09:07.379039 7f545d3b9cc0  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2859] Recovered from
> manifest file:db/MANIFEST-000095 succeeded,manifest_file_number is 95,
> next_file_number is 97, last_sequence is 21, log_number is 0,prev_log_number
> is 0,max_column_family is 0
>
> 2018-01-26 15:09:07.379046 7f545d3b9cc0  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2867] Column family
> [default] (ID 0), log number is 94
>
> 2018-01-26 15:09:07.379087 7f545d3b9cc0  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1516979347379083, "job": 1, "event": "recovery_started",
> "log_files": [96]}
> 2018-01-26 15:09:07.379091 7f545d3b9cc0  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl_open.cc:482] Recovering log #96
> mode 0
> 2018-01-26 15:09:07.379102 7f545d3b9cc0  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2395] Creating manifest 98
>
> 2018-01-26 15:09:07.380466 7f545d3b9cc0  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1516979347380463, "job": 1, "event": "recovery_finished"}
> 2018-01-26 15:09:07.381331 7f545d3b9cc0  4 rocksdb:
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl_open.cc:1063] DB pointer
> 0x556ecb8c3000
> 2018-01-26 15:09:07.381353 7f545d3b9cc0  1
> bluestore(/var/lib/ceph/osd/ceph-0) _open_db opened rocksdb path db options
> compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
> 2018-01-26 15:09:07.381616 7f545d3b9cc0  1 freelist init
> 2018-01-26 15:09:07.381660 7f545d3b9cc0  1
> bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc opening allocation metadata
> 2018-01-26 15:09:07.381679 7f545d3b9cc0  1
> bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 447 G in 1 extents
> 2018-01-26 15:09:07.382077 7f545d3b9cc0  0 _get_class not permitted to load
> kvs
> 2018-01-26 15:09:07.382309 7f545d3b9cc0  0 <cls>
> /build/ceph-12.2.2/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs
> 2018-01-26 15:09:07.382583 7f545d3b9cc0  0 _get_class not permitted to load
> sdk
> 2018-01-26 15:09:07.382827 7f545d3b9cc0  0 <cls>
> /build/ceph-12.2.2/src/cls/hello/cls_hello.cc:296: loading cls_hello
> 2018-01-26 15:09:07.385755 7f545d3b9cc0  0 _get_class not permitted to load
> lua
> 2018-01-26 15:09:07.386073 7f545d3b9cc0  0 osd.0 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-26 15:09:07.386078 7f545d3b9cc0  0 osd.0 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-26 15:09:07.386079 7f545d3b9cc0  0 osd.0 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-26 15:09:07.386132 7f545d3b9cc0  0 osd.0 0 load_pgs
> 2018-01-26 15:09:07.386134 7f545d3b9cc0  0 osd.0 0 load_pgs opened 0 pgs
> 2018-01-26 15:09:07.386137 7f545d3b9cc0  0 osd.0 0 using weightedpriority op
> queue with priority op cut off at 64.
> 2018-01-26 15:09:07.386580 7f545d3b9cc0 -1 osd.0 0 log_to_monitors
> {default=true}
> 2018-01-26 15:09:07.388077 7f545d3b9cc0 -1 osd.0 0 init authentication
> failed: (1) Operation not permitted
>
>
> The old osd is still there.
>
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME     STATUS    REWEIGHT PRI-AFF
> -1       2.60458 root default
> -2       0.86819     host int1
>  0   ssd 0.43159         osd.0 destroyed        0 1.00000
>  3   ssd 0.43660         osd.3        up  1.00000 1.00000
> -3       0.86819     host int2
>  1   ssd 0.43159         osd.1        up  1.00000 1.00000
>  4   ssd 0.43660         osd.4        up  1.00000 1.00000
> -4       0.86819     host int3
>  2   ssd 0.43159         osd.2        up  1.00000 1.00000
>  5   ssd 0.43660         osd.5        up  1.00000 1.00000
>
>
> What's the best course of action? Purging osd.0, zapping the device again
> and creating without --osd-id set?
>
>
> Kind Regards,
>
> David Majchrzak
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com