Re: filestore to bluestore: osdmap epoch problem and is the documentation correct?

Alfredo Deza <adeza@xxxxxxxxxx> · Wed, 10 Jan 2018 09:14:30 -0500

On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen <jmozdzen@xxxxxx> wrote:
> Dear *,
>
> has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
> keeping the OSD number? There have been a number of messages on the list,
> reporting problems, and my experience is the same. (Removing the existing
> OSD and creating a new one does work for me.)
>
> I'm working on an Ceph 12.2.2 cluster and tried following
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
> - this basically says
>
> 1. destroy old OSD
> 2. zap the disk
> 3. prepare the new OSD
> 4. activate the new OSD
>
> I never got step 4 to complete. The closest I got was by doing the following
> steps (assuming OSD ID "999" on /dev/sdzz):
>
> 1. Stop the old OSD via systemd (osd-node # systemctl stop
> ceph-osd@999.service)
>
> 2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)
>
> 3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
> volume group
>
> 3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)
>
> 4. destroy the old OSD (osd-node # ceph osd destroy 999
> --yes-i-really-mean-it)
>
> 5. create a new OSD entry (osd-node # ceph osd new $(cat
> /var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

>
> 6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
> osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
> /var/lib/ceph/osd/ceph-999/keyring)
>
> 7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
> --osd-id 999 --data /dev/sdzz)
> mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642

In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.

>
> but ceph-osd keeps complaining "osdmap says I am destroyed, exiting" on
> "osd-node # systemctl start ceph-osd@999.service".
>
> At first I felt I was hitting http://tracker.ceph.com/issues/21023
> (BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to v12.1.4
> upgrade). But I was already using the "ceph osd new" command, which didn't
> help.
>
> Some hours of sleep later I matched the issued commands to the osdmap
> changes and the ceph-osd log messages, which revealed something strange:
>
> - from issuing "ceph osd destroy", osdmap lists the OSD as
> "autoout,destroyed,exists" (no surprise here)
> - once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
> - starting ceph-osd after "ceph osd new" reports "osdmap says I am
> destroyed, exiting"
>
> I can see in the ceph-osd log that it is relating to an *old* osdmap epoch,
> roughly 45 minutes old by then?
>
> This got me curious and I dug through the OSD log file, checking the epoch
> numbers during start-up:
>
> I took some detours, so there's more than two failed starts in the OSD log
> file ;) :
>
> --- cut here ---
> # first of multiple attempts, before "ceph auth add ..."
> # no actual epoch referenced, as login failed due to missing auth
> 2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
> 2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
> 2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
> {default=true}
> 2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
> failed: (1) Operation not permitted
>
> # after "ceph auth ..."
> # note the different epochs below? BTW, 110587 is the current epoch at that
> time and osd.999 is marked destroyed there
> # 109892: much too old to offer any details
> # 110587: modified 2018-01-09 23:43:13.202381
>
> 2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has features
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
> 2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
> 2018-01-10 00:08:00.945599 7fc55905bd00  0 osd.999 0 using weightedpriority
> op queue with priority op cut off at 64.
> 2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
> {default=true}
> 2018-01-10 00:08:00.951720 7fc55905bd00  0 osd.999 0 done with init,
> starting boot process
> 2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
> osdmap
> 2018-01-10 00:08:00.970644 7fc546614700  0 osd.999 109892 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:08:00.970653 7fc546614700  0 osd.999 109892 crush map has
> features 288232610642264064 was 288232575208792577, adjusting msgr requires
> for mons
> 2018-01-10 00:08:00.970660 7fc546614700  0 osd.999 109892 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
>
> # another try
> # it is now using epoch 110587 for everything. But that one is off by one at
> that time already:
> # 110587: modified 2018-01-09 23:43:13.202381
> # 110588: modified 2018-01-10 00:12:55.271913
>
> # but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
> 2018-01-10 00:13:04.332026 7f408d5a4d00  0 osd.999 110587 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:13:04.332037 7f408d5a4d00  0 osd.999 110587 crush map has
> features 288232610642264064 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:13:04.332043 7f408d5a4d00  0 osd.999 110587 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:13:04.332092 7f408d5a4d00  0 osd.999 110587 load_pgs
> 2018-01-10 00:13:04.332096 7f408d5a4d00  0 osd.999 110587 load_pgs opened 0
> pgs
> 2018-01-10 00:13:04.332100 7f408d5a4d00  0 osd.999 110587 using
> weightedpriority op queue with priority op cut off at 64.
> 2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
> {default=true}
> 2018-01-10 00:13:06.026628 7f408d5a4d00  0 osd.999 110587 done with init,
> starting boot process
> 2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
>
> # the attempt after using "ceph osd new", which created epoch 110591 as the
> first with osd.999 as autoout,exists,new
> # But ceph-osd still uses 110587.
> # 110587: modified 2018-01-09 23:43:13.202381
> # 110591: modified 2018-01-10 00:30:44.850078
>
> 2018-01-10 00:31:15.453871 7f1c57c58d00  0 osd.999 110587 crush map has
> features 288232610642264064, adjusting msgr requires for clients
> 2018-01-10 00:31:15.453882 7f1c57c58d00  0 osd.999 110587 crush map has
> features 288232610642264064 was 8705, adjusting msgr requires for mons
> 2018-01-10 00:31:15.453887 7f1c57c58d00  0 osd.999 110587 crush map has
> features 1008808551021559808, adjusting msgr requires for osds
> 2018-01-10 00:31:15.453940 7f1c57c58d00  0 osd.999 110587 load_pgs
> 2018-01-10 00:31:15.453945 7f1c57c58d00  0 osd.999 110587 load_pgs opened 0
> pgs
> 2018-01-10 00:31:15.453952 7f1c57c58d00  0 osd.999 110587 using
> weightedpriority op queue with priority op cut off at 64.
> 2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
> {default=true}
> 2018-01-10 00:31:15.520533 7f1c57c58d00  0 osd.999 110587 done with init,
> starting boot process
> 2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
> destroyed, exiting
> --- cut here ---
>
>
> So why is ceph-osd referring to an old osdmap, while newer ones are
> available for some time already?
>
> And am I right to believe that *if* ceph-osd had checked the then current
> osdmap, it would have started successfully (once I did the "ceph osd new"
> that's not mentioned in the docs)?
>
> Is the documented procedure (from the "master" HTML docs) correct, or should
> the "ceph auth" and "ceph osd new" steps get added?
>
> Regards,
> Jens
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com