Hello However I’m running the latest version of Jewel 10.2.7, although I’m in the middle of upgrading the cluster (from 10.2.5). At first it was on a couple of nodes, but now it seems to be more pervasive. I have seen this issue with osd_map_cache_size set to 20 as well as 500, which I increased to try and compensate for it. My two questions, are 1) is this fixed, if so in which version. 2) is there a way to recover the damaged OSD metadata, as I really don’t want to keep having to rebuild large numbers of disks based on something arbitrary. SEEK_HOLE is disabled via 'filestore seek data hole' config option -31> 2017-05-24 10:23:10.152349 7f24035e2800 0 genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice is s upported -30> 2017-05-24 10:23:10.182065 7f24035e2800 0 genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: syncfs(2) s yscall fully supported (by glibc and kernel) -29> 2017-05-24 10:23:10.182112 7f24035e2800 0 xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is disab led by conf -28> 2017-05-24 10:23:10.182839 7f24035e2800 1 leveldb: Recovering log #23079 -27> 2017-05-24 10:23:10.284173 7f24035e2800 1 leveldb: Delete type=0 #23079 -26> 2017-05-24 10:23:10.284223 7f24035e2800 1 leveldb: Delete type=3 #23078 -25> 2017-05-24 10:23:10.284807 7f24035e2800 0 filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal mode: c heckpoint is not enabled -24> 2017-05-24 10:23:10.285581 7f24035e2800 2 journal open /var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8 61577bc98 fs_op_seq 20363902 -23> 2017-05-24 10:23:10.289523 7f24035e2800 1 journal _open /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 -22> 2017-05-24 10:23:10.293733 7f24035e2800 2 journal open advancing committed_seq 20363681 to fs op_seq 20363902 -21> 2017-05-24 10:23:10.293743 7f24035e2800 2 journal read_entry -- not readable -20> 2017-05-24 10:23:10.293744 7f24035e2800 2 journal read_entry -- not readable -19> 2017-05-24 10:23:10.293745 7f24035e2800 3 journal journal_replay: end of journal, done. -18> 2017-05-24 10:23:10.297605 7f24035e2800 1 journal _open /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 -17> 2017-05-24 10:23:10.298470 7f24035e2800 1 filestore(/var/lib/ceph/osd/txc1-1908) upgrade -16> 2017-05-24 10:23:10.298509 7f24035e2800 2 osd.1908 0 boot -15> 2017-05-24 10:23:10.300096 7f24035e2800 1 <cls> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class! -14> 2017-05-24 10:23:10.300384 7f24035e2800 1 <cls> cls/user/cls_user.cc:375: Loaded user class! -13> 2017-05-24 10:23:10.300617 7f24035e2800 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello -12> 2017-05-24 10:23:10.303748 7f24035e2800 1 <cls> cls/refcount/cls_refcount.cc:232: Loaded refcount class! -11> 2017-05-24 10:23:10.304120 7f24035e2800 1 <cls> cls/version/cls_version.cc:228: Loaded version class! -10> 2017-05-24 10:23:10.304439 7f24035e2800 1 <cls> cls/log/cls_log.cc:317: Loaded log class! -9> 2017-05-24 10:23:10.307437 7f24035e2800 1 <cls> cls/rgw/cls_rgw.cc:3359: Loaded rgw class! -8> 2017-05-24 10:23:10.307768 7f24035e2800 1 <cls> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class! -7> 2017-05-24 10:23:10.307927 7f24035e2800 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan -6> 2017-05-24 10:23:10.308086 7f24035e2800 1 <cls> cls/statelog/cls_statelog.cc:306: Loaded log class! -5> 2017-05-24 10:23:10.315241 7f24035e2800 0 osd.1908 863035 crush map has features 2234490552320, adjusting msgr requires for clients -4> 2017-05-24 10:23:10.315258 7f24035e2800 0 osd.1908 863035 crush map has features 2234490552320 was 8705, adjusting msgr req uires for mons -3> 2017-05-24 10:23:10.315267 7f24035e2800 0 osd.1908 863035 crush map has features 2234490552320, adjusting msgr requires for osds -2> 2017-05-24 10:23:10.441444 7f24035e2800 0 osd.1908 863035 load_pgs -1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs: have pgid 11.3f5a at epoch 863078, but missing map. Crashing. 0> 2017-05-24 10:23:10.444151 7f24035e2800 -1 osd/OSD.cc: In function 'void OSD::load_pgs()' thread 7f24035e2800 time 2017-05-24 10:23:10.442617 osd/OSD.cc: 3189: FAILED assert(0 == "Missing map in load_pgs") ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x55d1874be6db] 2: (OSD::load_pgs()+0x1f9b) [0x55d186e6a26b] 3: (OSD::init()+0x1f74) [0x55d186e7aec4] 4: (main()+0x29d1) [0x55d186de1d71] 5: (__libc_start_main()+0xf5) [0x7f24004fdf45] 6: (()+0x356a47) [0x55d186e2aa47] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com