We got our OSD's back Since we removed the EC-Pool (cephfs.data) we had to figure out how to remove the PG from teh Offline OSD and hier is how we did it. remove cehfs, remove cache layer, remove pools: #ceph mds fail 0 #ceph fs rm cephfs --yes-i-really-mean-it #ceph osd tier remove-overlay cephfs.data there is now (or already was) no overlay for 'cephfs.data' #ceph osd tier remove cephfs.data cephfs.cache pool 'cephfs.cache' is now (or already was) not a tier of 'cephfs.data' #ceph tell mon.\* injectargs '--mon-allow-pool-delete=true' #ceph osd pool delete cephfs.cache cephfs.cache --yes-i-really-really-mean-it pool 'cephfs.cache' removed #ceph osd pool delete cephfs.data cephfs.data --yes-i-really-really-mean-it pool 'cephfs.data' removed #ceph osd pool delete cephfs.metadata cephfs.metadata --yes-i-really-really-mean-it pool 'cephfs.metadata' removed remove placement groups of pool 23 (cephfs.data) from all offline OSDs: DATAPATH=/var/lib/ceph/osd/ceph-${OSD} a=`ceph-objectstore-tool --data-path ${DATAPATH} --op list-pgs | grep "^23\."` for i in $a; do echo "INFO: removing ${i} from OSD ${OSD}" ceph-objectstore-tool --data-path ${DATAPATH} --pgid ${i} --op remove --force done since we now had removed our cephfs we still not know if we could have solved it without data loss by upgrading to nautilus. Have a nice Weekend, Ansgar Am Mi., 7. Aug. 2019 um 17:03 Uhr schrieb Ansgar Jazdzewski <a.jazdzewski@xxxxxxxxxxxxxx>: > > another update, > > we now took the more destructive route and removed the cephfs pools > (lucky we had only test date in the filesystem) > Our hope was that within the startup-process the osd will delete the > no longer needed PG, But this is NOT the Case. > > So we are still have the same issue the only difference is that the PG > does not belong to a pool anymore. > > -360> 2019-08-07 14:52:32.655 7fb14db8de00 5 osd.44 pg_epoch: 196586 > pg[23.f8s0(unlocked)] enter Initial > -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1 > /build/ceph-13.2.6/src/osd/ECUtil.h: In function > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread > 7fb14db8de00 time 2019-08-07 14:52:32.660169 > /build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % > stripe_size == 0) > > we now can take one rout and try to delete the pg by hand in the OSD > (bluestore) how this can be done? OR we try to upgrade to Nautilus and > hope for the beset. > > any help hints are welcome, > have a nice one > Ansgar > > Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski > <a.jazdzewski@xxxxxxxxxxxxxx>: > > > > Hi, > > > > as a follow-up: > > * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6 > > * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn > > * ceph osd dump and ceph osd erasure-code-profile get cephfs > > https://pastebin.com/TRLPaWcH > > > > as we try to dig more into it, it looks like a bug in the cephfs or > > erasure-coding part of ceph. > > > > Ansgar > > > > > > Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski > > <a.jazdzewski@xxxxxxxxxxxxxx>: > > > > > > hi folks, > > > > > > we had to move one of our clusters so we had to boot all servers, now > > > we found an Error on all OSD with the EC-Pool. > > > > > > do we miss some opitons, will an upgrade to 13.2.6 help? > > > > > > > > > Thanks, > > > Ansgar > > > > > > 2019-08-06 12:10:16.265 7fb337b83200 -1 > > > /build/ceph-13.2.4/src/osd/ECUtil.h: In function > > > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread > > > 7fb337b83200 time 2019-08-06 12:10:16.263025 > > > /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % > > > stripe_size == 0) > > > > > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic > > > (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char > > > const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3: > > > (ECBackend::ECBackend(PGBackend::Listener, coll_t const&, > > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > > > CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned > > > long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t > > > const&, std::map<std::cxx11::basic_string<char, > > > std::char_traits<char>, std::allocator<char> >, > > > std::cxx11::basic_string<char, std::char_traits<char>, > > > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > > > std::char_traits<char>, std::allocator<char> > >, std > > > ::allocator<std::pair<std::__cxx11::basic_string<char, > > > std::char_traits<char>, std::allocator<char> > const, > > > std::cxx11::basic_string<char, std::char_traits<char>, > > > std::allocator<char> > > > > const&, PGBackend::Listener, coll_t, > > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > > > CephContext)+0x2f9 ) [0x9474e9] 5: > > > (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>, > > > PGPool const&, std::map<std::cxx11::basic_string<char, > > > std::char_traits<char>, std::allocator<char> >, > > > std::cxx11::basic_string<char, std::char_traits<char>, > > > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > > > std::char_tra its<char>, std::allocator<char> > >, > > > std::allocator<std::pair<std::__cxx11::basic_string<char, > > > std::char_traits<char>, std::allocator<char> > const, > > > std::cxx11::basic_string<char, std::char_traits<char>, > > > std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6: > > > (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3) > > > [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8: > > > (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10: > > > (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29) > > > [0x717389] NOTE: a copy of the executable, or objdump -rdS > > > <executable> is needed to interpret this. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com