filestore to bluestore: osdmap epoch problem and is the documentation correct?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear *,

has anybody been successful migrating Filestore OSDs to Bluestore OSDs, keeping the OSD number? There have been a number of messages on the list, reporting problems, and my experience is the same. (Removing the existing OSD and creating a new one does work for me.)

I'm working on an Ceph 12.2.2 cluster and tried following http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd - this basically says

1. destroy old OSD
2. zap the disk
3. prepare the new OSD
4. activate the new OSD

I never got step 4 to complete. The closest I got was by doing the following steps (assuming OSD ID "999" on /dev/sdzz):

1. Stop the old OSD via systemd (osd-node # systemctl stop ceph-osd@999.service)

2. umount the old OSD (osd-node # umount /var/lib/ceph/osd/ceph-999)

3a. if the old OSD was Bluestore with LVM, manually clean up the old OSD's volume group

3b. zap the block device (osd-node # ceph-volume lvm zap /dev/sdzz)

4. destroy the old OSD (osd-node # ceph osd destroy 999 --yes-i-really-mean-it)

5. create a new OSD entry (osd-node # ceph osd new $(cat /var/lib/ceph/osd/ceph-999/fsid) 999)

6. add the OSD secret to Ceph authentication (osd-node # ceph auth add osd.999 mgr 'allow profile osd' osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

7. prepare the new OSD (osd-node # ceph-volume lvm prepare --bluestore --osd-id 999 --data /dev/sdzz)
mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-999/keyring)

but ceph-osd keeps complaining "osdmap says I am destroyed, exiting" on "osd-node # systemctl start ceph-osd@999.service".

At first I felt I was hitting http://tracker.ceph.com/issues/21023 (BlueStore-OSDs marked as destroyed in OSD-map after v12.1.1 to v12.1.4 upgrade). But I was already using the "ceph osd new" command, which didn't help.

Some hours of sleep later I matched the issued commands to the osdmap changes and the ceph-osd log messages, which revealed something strange:

- from issuing "ceph osd destroy", osdmap lists the OSD as "autoout,destroyed,exists" (no surprise here)
- once I issued "ceph osd new", osdmap lists the OSD as "autoout,exists,new"
- starting ceph-osd after "ceph osd new" reports "osdmap says I am destroyed, exiting"

I can see in the ceph-osd log that it is relating to an *old* osdmap epoch, roughly 45 minutes old by then?

This got me curious and I dug through the OSD log file, checking the epoch numbers during start-up:

I took some detours, so there's more than two failed starts in the OSD log file ;) :

--- cut here ---
# first of multiple attempts, before "ceph auth add ..."
# no actual epoch referenced, as login failed due to missing auth
2018-01-10 00:00:02.173983 7f5cf1c89d00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for clients 2018-01-10 00:00:02.173990 7f5cf1c89d00 0 osd.999 0 crush map has features 288232575208783872 was 8705, adjusting msgr requires for mons 2018-01-10 00:00:02.173994 7f5cf1c89d00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for osds
2018-01-10 00:00:02.174046 7f5cf1c89d00  0 osd.999 0 load_pgs
2018-01-10 00:00:02.174051 7f5cf1c89d00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:00:02.174055 7f5cf1c89d00 0 osd.999 0 using weightedpriority op queue with priority op cut off at 64. 2018-01-10 00:00:02.174891 7f5cf1c89d00 -1 osd.999 0 log_to_monitors {default=true} 2018-01-10 00:00:02.177479 7f5cf1c89d00 -1 osd.999 0 init authentication failed: (1) Operation not permitted

# after "ceph auth ..."
# note the different epochs below? BTW, 110587 is the current epoch at that time and osd.999 is marked destroyed there
# 109892: much too old to offer any details
# 110587: modified 2018-01-09 23:43:13.202381

2018-01-10 00:08:00.945507 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for clients 2018-01-10 00:08:00.945514 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872 was 8705, adjusting msgr requires for mons 2018-01-10 00:08:00.945521 7fc55905bd00 0 osd.999 0 crush map has features 288232575208783872, adjusting msgr requires for osds
2018-01-10 00:08:00.945588 7fc55905bd00  0 osd.999 0 load_pgs
2018-01-10 00:08:00.945594 7fc55905bd00  0 osd.999 0 load_pgs opened 0 pgs
2018-01-10 00:08:00.945599 7fc55905bd00 0 osd.999 0 using weightedpriority op queue with priority op cut off at 64. 2018-01-10 00:08:00.946544 7fc55905bd00 -1 osd.999 0 log_to_monitors {default=true} 2018-01-10 00:08:00.951720 7fc55905bd00 0 osd.999 0 done with init, starting boot process 2018-01-10 00:08:00.952225 7fc54160a700 -1 osd.999 0 waiting for initial osdmap 2018-01-10 00:08:00.970644 7fc546614700 0 osd.999 109892 crush map has features 288232610642264064, adjusting msgr requires for clients 2018-01-10 00:08:00.970653 7fc546614700 0 osd.999 109892 crush map has features 288232610642264064 was 288232575208792577, adjusting msgr requires for mons 2018-01-10 00:08:00.970660 7fc546614700 0 osd.999 109892 crush map has features 1008808551021559808, adjusting msgr requires for osds 2018-01-10 00:08:01.349602 7fc546614700 -1 osd.999 110587 osdmap says I am destroyed, exiting

# another try
# it is now using epoch 110587 for everything. But that one is off by one at that time already:
# 110587: modified 2018-01-09 23:43:13.202381
# 110588: modified 2018-01-10 00:12:55.271913

# but both 110587 and 110588 have osd.999 as "destroyed", so never mind.
2018-01-10 00:13:04.332026 7f408d5a4d00 0 osd.999 110587 crush map has features 288232610642264064, adjusting msgr requires for clients 2018-01-10 00:13:04.332037 7f408d5a4d00 0 osd.999 110587 crush map has features 288232610642264064 was 8705, adjusting msgr requires for mons 2018-01-10 00:13:04.332043 7f408d5a4d00 0 osd.999 110587 crush map has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:13:04.332092 7f408d5a4d00  0 osd.999 110587 load_pgs
2018-01-10 00:13:04.332096 7f408d5a4d00 0 osd.999 110587 load_pgs opened 0 pgs 2018-01-10 00:13:04.332100 7f408d5a4d00 0 osd.999 110587 using weightedpriority op queue with priority op cut off at 64. 2018-01-10 00:13:04.332990 7f408d5a4d00 -1 osd.999 110587 log_to_monitors {default=true} 2018-01-10 00:13:06.026628 7f408d5a4d00 0 osd.999 110587 done with init, starting boot process 2018-01-10 00:13:06.027627 7f4075352700 -1 osd.999 110587 osdmap says I am destroyed, exiting

# the attempt after using "ceph osd new", which created epoch 110591 as the first with osd.999 as autoout,exists,new
# But ceph-osd still uses 110587.
# 110587: modified 2018-01-09 23:43:13.202381
# 110591: modified 2018-01-10 00:30:44.850078

2018-01-10 00:31:15.453871 7f1c57c58d00 0 osd.999 110587 crush map has features 288232610642264064, adjusting msgr requires for clients 2018-01-10 00:31:15.453882 7f1c57c58d00 0 osd.999 110587 crush map has features 288232610642264064 was 8705, adjusting msgr requires for mons 2018-01-10 00:31:15.453887 7f1c57c58d00 0 osd.999 110587 crush map has features 1008808551021559808, adjusting msgr requires for osds
2018-01-10 00:31:15.453940 7f1c57c58d00  0 osd.999 110587 load_pgs
2018-01-10 00:31:15.453945 7f1c57c58d00 0 osd.999 110587 load_pgs opened 0 pgs 2018-01-10 00:31:15.453952 7f1c57c58d00 0 osd.999 110587 using weightedpriority op queue with priority op cut off at 64. 2018-01-10 00:31:15.454862 7f1c57c58d00 -1 osd.999 110587 log_to_monitors {default=true} 2018-01-10 00:31:15.520533 7f1c57c58d00 0 osd.999 110587 done with init, starting boot process 2018-01-10 00:31:15.521278 7f1c40207700 -1 osd.999 110587 osdmap says I am destroyed, exiting
--- cut here ---


So why is ceph-osd referring to an old osdmap, while newer ones are available for some time already?

And am I right to believe that *if* ceph-osd had checked the then current osdmap, it would have started successfully (once I did the "ceph osd new" that's not mentioned in the docs)?

Is the documented procedure (from the "master" HTML docs) correct, or should the "ceph auth" and "ceph osd new" steps get added?

Regards,
Jens

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux