There is a corruption of the osdmaps on this particular OSD. You
need determine which maps are bad probably by bumping the osd debug
level to 20. Then transfer them from a working OSD. The newest
ceph-objectstore-tool has features to write the maps, but you'll
need to build a version based on a v0.94.4 source tree. I don't
know if you can just copy files with names like
"current/meta/osdmap.8__0_FD6E4D61__none" (map for epoch 8) between
OSDs.
David
On 10/21/15 8:54 PM, James O'Neill
wrote:
I
have an OSD that didn't come up after a reboot. I was getting the
error show below. it was running 0.94.3 so I reinstalled all
packages. I then upgraded everything to 0.94.4 hoping that would
fix it but it hasn't. There are three OSDs, this is the only one
having problems (it also contains the inconsistent pgs). Can
anyone tell me what the problem might be?
root@dbp-ceph03:/srv/data# ceph status
cluster 4f6fb784-bd17-4105-a689-e8d1b4bc5643
health HEALTH_ERR
53 pgs inconsistent
542 pgs stale
542 pgs stuck stale
5 requests are blocked > 32 sec
85 scrub errors
too many PGs per OSD (544 > max 300)
noout flag(s) set
monmap e3: 3 mons at
{dbp-ceph01=172.17.241.161:6789/0,dbp-ceph02=172.17.241.162:6789/0,dbp-ceph03=172.17.241.163:6789/0}
election epoch 52, quorum 0,1,2
dbp-ceph01,dbp-ceph02,dbp-ceph03
osdmap e107: 2 osds: 2 up, 2 in
flags noout
pgmap v65678: 1088 pgs, 9 pools, 55199 kB data, 173 objects
2265 MB used, 16580 MB / 19901 MB avail
546 active+clean
489 stale+active+clean
53 stale+active+clean+inconsistent
root@dbp-ceph02:~# /usr/bin/ceph-osd --cluster=ceph -i 1 -d
2015-10-22 14:15:48.312507 7f4edabec900 0 ceph version 0.94.4
(95292699291242794510b39ffde3f4df67898d3a), process ceph-osd, pid
31215
starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1
/var/lib/ceph/osd/ceph-1/journal
2015-10-22 14:15:48.352013 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) backend generic (magic 0xef53)
2015-10-22 14:15:48.355621 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is supported and appears to work
2015-10-22 14:15:48.355655 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-10-22 14:15:48.362016 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2015-10-22 14:15:48.372819 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) limited size xattrs
2015-10-22 14:15:48.387002 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD
journal mode: checkpoint is not enabled
2015-10-22 14:15:48.394002 7f4edabec900 -1 journal
FileJournal::_open: disabling aio for non-block journal. Use
journal_force_aio to force use of aio anyway
2015-10-22 14:15:48.397803 7f4edabec900 0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
terminate called after throwing an instance of
'ceph::buffer::end_of_buffer'
what(): buffer::end_of_buffer
*** Caught signal (Aborted) **
in thread 7f4edabec900
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: /usr/bin/ceph-osd() [0xacd94a]
2: (()+0x10340) [0x7f4ed98a1340]
3: (gsignal()+0x39) [0x7f4ed7d3fcc9]
4: (abort()+0x148) [0x7f4ed7d430d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155)
[0x7f4ed864b6b5]
6: (()+0x5e836) [0x7f4ed8649836]
7: (()+0x5e863) [0x7f4ed8649863]
8: (()+0x5eaa2) [0x7f4ed8649aa2]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137)
[0xc35ef7]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d)
[0xb834ed]
11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f]
12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0]
13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e]
14: (OSD::init()+0x6ad) [0x6c5e0d]
15: (main()+0x2860) [0x6527e0]
16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5]
17: /usr/bin/ceph-osd() [0x66b887]
2015-10-22 14:15:48.412520 7f4edabec900 -1 *** Caught signal
(Aborted) **
in thread 7f4edabec900
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: /usr/bin/ceph-osd() [0xacd94a]
2: (()+0x10340) [0x7f4ed98a1340]
3: (gsignal()+0x39) [0x7f4ed7d3fcc9]
4: (abort()+0x148) [0x7f4ed7d430d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155)
[0x7f4ed864b6b5]
6: (()+0x5e836) [0x7f4ed8649836]
7: (()+0x5e863) [0x7f4ed8649863]
8: (()+0x5eaa2) [0x7f4ed8649aa2]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137)
[0xc35ef7]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d)
[0xb834ed]
11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f]
12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0]
13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e]
14: (OSD::init()+0x6ad) [0x6c5e0d]
15: (main()+0x2860) [0x6527e0]
16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5]
17: /usr/bin/ceph-osd() [0x66b887]
NOTE: a copy of the executable, or `objdump -rdS
<executable>` is needed to interpret this.
--- begin dump of recent events ---
-61> 2015-10-22 14:15:48.308047 7f4edabec900 5
asok(0x5648000) register_command perfcounters_dump hook 0x55e8050
-60> 2015-10-22 14:15:48.308138 7f4edabec900 5
asok(0x5648000) register_command 1 hook 0x55e8050
-59> 2015-10-22 14:15:48.308164 7f4edabec900 5
asok(0x5648000) register_command perf dump hook 0x55e8050
-58> 2015-10-22 14:15:48.308181 7f4edabec900 5
asok(0x5648000) register_command perfcounters_schema hook
0x55e8050
-57> 2015-10-22 14:15:48.308192 7f4edabec900 5
asok(0x5648000) register_command 2 hook 0x55e8050
-56> 2015-10-22 14:15:48.308198 7f4edabec900 5
asok(0x5648000) register_command perf schema hook 0x55e8050
-55> 2015-10-22 14:15:48.308223 7f4edabec900 5
asok(0x5648000) register_command perf reset hook 0x55e8050
-54> 2015-10-22 14:15:48.308242 7f4edabec900 5
asok(0x5648000) register_command config show hook 0x55e8050
-53> 2015-10-22 14:15:48.308249 7f4edabec900 5
asok(0x5648000) register_command config set hook 0x55e8050
-52> 2015-10-22 14:15:48.308254 7f4edabec900 5
asok(0x5648000) register_command config get hook 0x55e8050
-51> 2015-10-22 14:15:48.308259 7f4edabec900 5
asok(0x5648000) register_command config diff hook 0x55e8050
-50> 2015-10-22 14:15:48.308263 7f4edabec900 5
asok(0x5648000) register_command log flush hook 0x55e8050
-49> 2015-10-22 14:15:48.308268 7f4edabec900 5
asok(0x5648000) register_command log dump hook 0x55e8050
-48> 2015-10-22 14:15:48.308274 7f4edabec900 5
asok(0x5648000) register_command log reopen hook 0x55e8050
-47> 2015-10-22 14:15:48.312507 7f4edabec900 0 ceph version
0.94.4 (95292699291242794510b39ffde3f4df67898d3a), process
ceph-osd, pid 31215
-46> 2015-10-22 14:15:48.313730 7f4edabec900 1 --
172.17.241.162:0/0 learned my addr 172.17.241.162:0/0
-45> 2015-10-22 14:15:48.313762 7f4edabec900 1
accepter.accepter.bind my_inst.addr is 172.17.241.162:6800/31215
need_addr=0
-44> 2015-10-22 14:15:48.313795 7f4edabec900 1 --
172.17.241.162:0/0 learned my addr 172.17.241.162:0/0
-43> 2015-10-22 14:15:48.313803 7f4edabec900 1
accepter.accepter.bind my_inst.addr is 172.17.241.162:6801/31215
need_addr=0
-42> 2015-10-22 14:15:48.313825 7f4edabec900 1 --
172.17.241.162:0/0 learned my addr 172.17.241.162:0/0
-41> 2015-10-22 14:15:48.313832 7f4edabec900 1
accepter.accepter.bind my_inst.addr is 172.17.241.162:6802/31215
need_addr=0
-40> 2015-10-22 14:15:48.313855 7f4edabec900 1 --
172.17.241.162:0/0 learned my addr 172.17.241.162:0/0
-39> 2015-10-22 14:15:48.313863 7f4edabec900 1
accepter.accepter.bind my_inst.addr is 172.17.241.162:6803/31215
need_addr=0
-38> 2015-10-22 14:15:48.317379 7f4edabec900 5
asok(0x5648000) init /var/run/ceph/ceph-osd.1.asok
-37> 2015-10-22 14:15:48.317419 7f4edabec900 5
asok(0x5648000) bind_and_listen /var/run/ceph/ceph-osd.1.asok
-36> 2015-10-22 14:15:48.317480 7f4edabec900 5
asok(0x5648000) register_command 0 hook 0x55e40a8
-35> 2015-10-22 14:15:48.317502 7f4edabec900 5
asok(0x5648000) register_command version hook 0x55e40a8
-34> 2015-10-22 14:15:48.317508 7f4edabec900 5
asok(0x5648000) register_command git_version hook 0x55e40a8
-33> 2015-10-22 14:15:48.317515 7f4edabec900 5
asok(0x5648000) register_command help hook 0x55e8140
-32> 2015-10-22 14:15:48.317520 7f4edabec900 5
asok(0x5648000) register_command get_command_descriptions hook
0x55e8130
-31> 2015-10-22 14:15:48.317624 7f4edabec900 10
monclient(hunting): build_initial_monmap
-30> 2015-10-22 14:15:48.317654 7f4ed44f2700 5
asok(0x5648000) entry start
-29> 2015-10-22 14:15:48.350458 7f4edabec900 5 adding auth
protocol: none
-28> 2015-10-22 14:15:48.350522 7f4edabec900 5 adding auth
protocol: none
-27> 2015-10-22 14:15:48.350815 7f4edabec900 5
asok(0x5648000) register_command objecter_requests hook 0x55e8230
-26> 2015-10-22 14:15:48.351004 7f4edabec900 5
filestore(/var/lib/ceph/osd/ceph-1) test_mount basedir
/var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal
-25> 2015-10-22 14:15:48.351200 7f4edabec900 1 --
172.17.241.162:6800/31215 messenger.start
-24> 2015-10-22 14:15:48.351333 7f4edabec900 1 -- :/0
messenger.start
-23> 2015-10-22 14:15:48.351404 7f4edabec900 1 --
172.17.241.162:6803/31215 messenger.start
-22> 2015-10-22 14:15:48.351473 7f4edabec900 1 --
172.17.241.162:6802/31215 messenger.start
-21> 2015-10-22 14:15:48.351537 7f4edabec900 1 --
172.17.241.162:6801/31215 messenger.start
-20> 2015-10-22 14:15:48.351599 7f4edabec900 1 -- :/0
messenger.start
-19> 2015-10-22 14:15:48.351832 7f4edabec900 2 osd.1 0
mounting /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
-18> 2015-10-22 14:15:48.351874 7f4edabec900 5
filestore(/var/lib/ceph/osd/ceph-1) basedir
/var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal
-17> 2015-10-22 14:15:48.352013 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) backend generic (magic 0xef53)
-16> 2015-10-22 14:15:48.355621 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is supported and appears to work
-15> 2015-10-22 14:15:48.355655 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
-14> 2015-10-22 14:15:48.362016 7f4edabec900 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
-13> 2015-10-22 14:15:48.372819 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) limited size xattrs
-12> 2015-10-22 14:15:48.373025 7f4edabec900 5
filestore(/var/lib/ceph/osd/ceph-1) mount op_seq is 128790
-11> 2015-10-22 14:15:48.387002 7f4edabec900 0
filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD
journal mode: checkpoint is not enabled
-10> 2015-10-22 14:15:48.394002 7f4edabec900 -1 journal
FileJournal::_open: disabling aio for non-block journal. Use
journal_force_aio to force use of aio anyway
-9> 2015-10-22 14:15:48.395535 7f4edabec900 2 osd.1 0 boot
-8> 2015-10-22 14:15:48.397803 7f4edabec900 0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
-7> 2015-10-22 14:15:48.398072 7f4edabec900 1 <cls>
cls/statelog/cls_statelog.cc:306: Loaded log class!
-6> 2015-10-22 14:15:48.398603 7f4edabec900 1 <cls>
cls/user/cls_user.cc:367: Loaded user class!
-5> 2015-10-22 14:15:48.398855 7f4edabec900 1 <cls>
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
-4> 2015-10-22 14:15:48.399120 7f4edabec900 1 <cls>
cls/log/cls_log.cc:312: Loaded log class!
-3> 2015-10-22 14:15:48.404859 7f4edabec900 1 <cls>
cls/refcount/cls_refcount.cc:231: Loaded refcount class!
-2> 2015-10-22 14:15:48.408976 7f4edabec900 1 <cls>
cls/rgw/cls_rgw.cc:3047: Loaded rgw class!
-1> 2015-10-22 14:15:48.409169 7f4edabec900 1 <cls>
cls/version/cls_version.cc:227: Loaded version class!
0> 2015-10-22 14:15:48.412520 7f4edabec900 -1 *** Caught
signal (Aborted) **
in thread 7f4edabec900
ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a)
1: /usr/bin/ceph-osd() [0xacd94a]
2: (()+0x10340) [0x7f4ed98a1340]
3: (gsignal()+0x39) [0x7f4ed7d3fcc9]
4: (abort()+0x148) [0x7f4ed7d430d8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x155)
[0x7f4ed864b6b5]
6: (()+0x5e836) [0x7f4ed8649836]
7: (()+0x5e863) [0x7f4ed8649863]
8: (()+0x5eaa2) [0x7f4ed8649aa2]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x137)
[0xc35ef7]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d)
[0xb834ed]
11: (OSDMap::decode(ceph::buffer::list&)+0x3f) [0xb8560f]
12: (OSDService::try_get_map(unsigned int)+0x530) [0x6ac2c0]
13: (OSDService::get_map(unsigned int)+0xe) [0x70ad2e]
14: (OSD::init()+0x6ad) [0x6c5e0d]
15: (main()+0x2860) [0x6527e0]
16: (__libc_start_main()+0xf5) [0x7f4ed7d2aec5]
17: /usr/bin/ceph-osd() [0x66b887]
NOTE: a copy of the executable, or `objdump -rdS
<executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
0/ 5 filestore
1/ 3 keyvaluestore
0/ 0 journal
0/ 5 ms
1/ 5 mon
0/20 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 10000
max_new 1000
log_file
--- end dump of recent events ---
Aborted (core dumped)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com