Re: filestore to bluestore: osdmap epoch problem and is the documentation correct?

"Jens-U. Mozdzen" <jmozdzen@xxxxxx> · Wed, 10 Jan 2018 14:29:16 +0000

Hi Alfredo,

thank you for your comments:

Zitat von Alfredo Deza <adeza@xxxxxxxxxx>:
On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen <jmozdzen@xxxxxx> wrote:
Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore OSDs,
keeping the OSD number? There have been a number of messages on the list,
reporting problems, and my experience is the same. (Removing the existing
OSD and creating a new one does work for me.)

I'm working on an Ceph 12.2.2 cluster and tried following
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd
- this basically says

1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the following
steps (assuming OSD ID "999" on /dev/sdzz):

1. Stop the old OSD via systemd (osd-node # systemctl stop
ceph-osd@999.service)

2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's
volume group

3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999
--yes-i-really-mean-it)

5. create a new OSD entry (osd-node # ceph osd new $(cat
/var/lib/ceph/osd/ceph-999/fsid) 999)

Step 5 and 6 are problematic if you are going to be trying ceph-volume
later on, which takes care of doing this for you.

6. add the OSD secret to Ceph authentication (osd-node # ceph auth add
osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i
/var/lib/ceph/osd/ceph-999/keyring)

I at first tried to follow the documented steps (without my steps 5  
and 6), which did not work for me. The documented approach failed with  
"init authentication >> failed: (1) Operation not permitted", because  
actually ceph-volume did not add the auth entry for me.

But even after manually adding the authentication, the "ceph-volume"  
approach failed, as the OSD was still marked "destroyed" in the osdmap  
epoch as used by ceph-osd (see the commented messages from  
ceph-osd.999.log below).

7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore
--osd-id 999 --data /dev/sdzz)

You are going to hit a bug in ceph-volume that is preventing you from
specifying the osd id directly if the ID has been destroyed.

See http://tracker.ceph.com/issues/22642

If I read that bug description correctly, you're confirming why I  
needed step #6 above (manually adding the OSD auth entry. But even if  
ceph-volume had added it, the ceph-osd.log entries suggest that  
starting the OSD would still have failed, because of accessing the  
wrong osdmap epoch.

To me it seems like I'm hitting a bug outside of ceph-volume - unless  
it's ceph-volume that somehow determines which osdmap epoch is used by  
ceph-osd.

In order for this to work, you would need to make sure that the ID has
really been destroyed and avoid passing --osd-id in ceph-volume. The
caveat
being that you will get whatever ID is available next in the cluster.

Yes, that's the work-around I then used - purge the old OSD and create  
a new one.

Thanks & regards,
Jens

[...]
--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:00:02.173990 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:00:02.173994 7f5cf1c89d00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00  0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication
failed: (1) Operation not permitted

# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that
time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2018-01-10 00:08:00.945514 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2018-01-10 00:08:00.945521 7fc55905bd00  0 osd.999 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00  0 osd.999 0 using weightedpriority
op queue with priority op cut off at 64.
2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors
{default=true}
2018-01-10 00:08:00.951720 7fc55905bd00  0 osd.999 0 done with init,
starting boot process
2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial
osdmap
2018-01-10 00:08:00.970644 7fc546614700  0 osd.999 109892 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:08:00.970653 7fc546614700  0 osd.999 109892 crush map has
features 288232610642264064 was 288232575208792577, adjusting msgr requires
for mons
2018-01-10 00:08:00.970660 7fc546614700  0 osd.999 109892 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am
destroyed, exiting

# another try
# it is now using epoch 110587 for everything. But that one is off by one at
that time already:
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913

# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00  0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:13:04.332037 7f408d5a4d00  0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:13:04.332043 7f408d5a4d00  0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00  0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00  0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:13:04.332100 7f408d5a4d00  0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:13:06.026628 7f408d5a4d00  0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am
destroyed, exiting

# the attempt after using "ceph osd new", which created epoch 110591 as the
first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078

2018-01-10 00:31:15.453871 7f1c57c58d00  0 osd.999 110587 crush map has
features 288232610642264064, adjusting msgr requires for clients
2018-01-10 00:31:15.453882 7f1c57c58d00  0 osd.999 110587 crush map has
features 288232610642264064 was 8705, adjusting msgr requires for mons
2018-01-10 00:31:15.453887 7f1c57c58d00  0 osd.999 110587 crush map has
features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00  0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00  0 osd.999 110587 load_pgs opened 0
pgs
2018-01-10 00:31:15.453952 7f1c57c58d00  0 osd.999 110587 using
weightedpriority op queue with priority op cut off at 64.
2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors
{default=true}
2018-01-10 00:31:15.520533 7f1c57c58d00  0 osd.999 110587 done with init,
starting boot process
2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am
destroyed, exiting
--- cut here ---
[...]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com