Re: OSD's keep crasching after clusterreboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We got our OSD's back

Since we removed the EC-Pool (cephfs.data) we had to figure out how to
remove the PG from teh Offline OSD and hier is how we did it.

remove cehfs, remove cache layer, remove pools:
#ceph mds fail 0
#ceph fs rm cephfs --yes-i-really-mean-it
#ceph osd tier remove-overlay cephfs.data
there is now (or already was) no overlay for 'cephfs.data'
#ceph osd tier remove cephfs.data cephfs.cache
pool 'cephfs.cache' is now (or already was) not a tier of 'cephfs.data'
#ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
#ceph osd pool delete cephfs.cache cephfs.cache --yes-i-really-really-mean-it
pool 'cephfs.cache' removed
#ceph osd pool delete cephfs.data cephfs.data --yes-i-really-really-mean-it
pool 'cephfs.data' removed
#ceph osd pool delete cephfs.metadata cephfs.metadata
--yes-i-really-really-mean-it
pool 'cephfs.metadata' removed

remove placement groups of pool 23 (cephfs.data) from all offline OSDs:
DATAPATH=/var/lib/ceph/osd/ceph-${OSD}
a=`ceph-objectstore-tool --data-path ${DATAPATH} --op list-pgs | grep "^23\."`
for i in $a; do
  echo "INFO: removing ${i} from OSD ${OSD}"
  ceph-objectstore-tool --data-path ${DATAPATH} --pgid ${i} --op remove --force
done

since we now had removed our cephfs we still not know if we could have
solved it without data loss by upgrading to nautilus.

Have a nice Weekend,
Ansgar

Am Mi., 7. Aug. 2019 um 17:03 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx>:
>
> another update,
>
> we now took the more destructive route and removed the cephfs pools
> (lucky we had only test date in the filesystem)
> Our hope was that within the startup-process the osd will delete the
> no longer needed PG, But this is NOT the Case.
>
> So we are still have the same issue the only difference is that the PG
> does not belong to a pool anymore.
>
>  -360> 2019-08-07 14:52:32.655 7fb14db8de00  5 osd.44 pg_epoch: 196586
> pg[23.f8s0(unlocked)] enter Initial
>  -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1
> /build/ceph-13.2.6/src/osd/ECUtil.h: In function
> 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
> 7fb14db8de00 time 2019-08-07 14:52:32.660169
> /build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
> stripe_size == 0)
>
> we now can take one rout and try to delete the pg by hand in the OSD
> (bluestore) how this can be done? OR we try to upgrade to Nautilus and
> hope for the beset.
>
> any help hints are welcome,
> have a nice one
> Ansgar
>
> Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski
> <a.jazdzewski@xxxxxxxxxxxxxx>:
> >
> > Hi,
> >
> > as a follow-up:
> > * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6
> > * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn
> > * ceph osd dump and ceph osd erasure-code-profile get cephfs
> > https://pastebin.com/TRLPaWcH
> >
> > as we try to dig more into it, it looks like a bug in the cephfs or
> > erasure-coding part of ceph.
> >
> > Ansgar
> >
> >
> > Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski
> > <a.jazdzewski@xxxxxxxxxxxxxx>:
> > >
> > > hi folks,
> > >
> > > we had to move one of our clusters so we had to boot all servers, now
> > > we found an Error on all OSD with the EC-Pool.
> > >
> > > do we miss some opitons, will an upgrade to 13.2.6 help?
> > >
> > >
> > > Thanks,
> > > Ansgar
> > >
> > > 2019-08-06 12:10:16.265 7fb337b83200 -1
> > > /build/ceph-13.2.4/src/osd/ECUtil.h: In function
> > > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
> > > 7fb337b83200 time 2019-08-06 12:10:16.263025
> > > /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
> > > stripe_size == 0)
> > >
> > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
> > > (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char
> > > const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3:
> > > (ECBackend::ECBackend(PGBackend::Listener, coll_t const&,
> > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> > > CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> > > long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t
> > > const&, std::map<std::cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> >,
> > > std::cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > >, std
> > > ::allocator<std::pair<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > const,
> > > std::cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> > > > > const&, PGBackend::Listener, coll_t,
> > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> > > CephContext)+0x2f9 ) [0x9474e9] 5:
> > > (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>,
> > > PGPool const&, std::map<std::cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> >,
> > > std::cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> > > std::char_tra its<char>, std::allocator<char> > >,
> > > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > const,
> > > std::cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6:
> > > (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3)
> > > [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8:
> > > (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10:
> > > (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29)
> > > [0x717389] NOTE: a copy of the executable, or objdump -rdS
> > > <executable> is needed to interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux