Hey folks,
I'm helping put together a new test/experimental cluster, and hit this today when bringing the cluster up for the first time (using mkcephfs).
After doing the normal "service ceph -a start", I noticed one OSD was down, and a lot of PGs were stuck creating. I tried restarting the down OSD, but it would come up. It always had this error:
-1> 2013-04-27 18:11:56.179804 b6fcd000 2 osd.1 0 boot
0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
osd/PG.cc: 2556: FAILED assert(values.size() == 1)
ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x1ad) [0x2c3c0a]
2: (OSD::load_pgs()+0x357) [0x28cba0]
3: (OSD::init()+0x741) [0x290a16]
4: (main()+0x1427) [0x2155c0]
5: (__libc_start_main()+0x99) [0xb69bcf42]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I then did a full cluster restart, and now I have ten OSDs down -- each showing the same exception/failed assert.I'm helping put together a new test/experimental cluster, and hit this today when bringing the cluster up for the first time (using mkcephfs).
After doing the normal "service ceph -a start", I noticed one OSD was down, and a lot of PGs were stuck creating. I tried restarting the down OSD, but it would come up. It always had this error:
-1> 2013-04-27 18:11:56.179804 b6fcd000 2 osd.1 0 boot
0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
osd/PG.cc: 2556: FAILED assert(values.size() == 1)
ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x1ad) [0x2c3c0a]
2: (OSD::load_pgs()+0x357) [0x28cba0]
3: (OSD::init()+0x741) [0x290a16]
4: (main()+0x1427) [0x2155c0]
5: (__libc_start_main()+0x99) [0xb69bcf42]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I know I'm running a weird version -- it's compiled from source, and was provided to me. The OSDs are all on ARM, and the mon is x86_64. Just looking to see if anyone has seen this particular stack trace of load_pgs()/peek_map_epoch() before....
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com