Re: OSD's keep crasching after clusterreboot

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Mon, 23 Sep 2019 22:32:54 +0200

Hi together,

for those reading along: We had to turn off all OSDs keeping our cephfs-data pool during the intervention, luckily everything came back fine.
However, we managed to leave the MDS's and OSDs keeping the cephfs-metadata pool and the MONs online. We restarted those sequentially afterwards, though.

So this probably means we are not affected by the upgrade bug - still, I would sleep better if somebody can confirm how to detected this bug and - if you are affected - how to edit the pool
to fix it.

Cheers,
	Oliver

On 2019-09-17 21:23, Oliver Freyermuth wrote:
Hi together,

it seems the issue described by Ansgar was reported and closed here as being fixed for newly created pools in post-Luminous releases:
https://tracker.ceph.com/issues/41336

However, it is unclear to me:
- How to find out if an EC cephfs you have created in Luminous is actually affected, before actually testing the "shutdown all" procedure,
   and thus having dying OSDs.
- If affected, how to fix it without purging the pool completely (which is not so easily done if there is 0.5 PB inside, which can't be restored without a long downtime).

If this is an acknowledged issue, it should probably also be mentioned in the upgrade notes from pre-Mimic to Mimic and newer before more people lose data.

In our case, we have such a a CephFS on an EC pool created with Luminous, and are right now running Mimic 13.2.6, but never tried a "full shutdown".
We need to try that on Friday, though... (cooling system maintenance).

"osd dump" contains:
----------------------------------------------------
pool 1 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 40903 flags hashpspool stripe_width 0 compression_algorithm snappy compression_mode aggressive application cephfs
pool 2 'cephfs_data' erasure size 6 min_size 5 crush_rule 2 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 40953 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 compression_algorithm snappy compression_mode aggressive application cephfs
----------------------------------------------------

and the EC profile is:
----------------------------------------------------
# ceph osd erasure-code-profile get cephfs_data
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
----------------------------------------------------

Neither contains the stripe_unit explicitly, so I wonder how to find out if it is (in)valid.
Checking the xattr ceph.file.layout.stripe_unit of some "old" files on the FS reveals 4194304 in my case.

Any help appreciated.

Cheers and all the best,
     Oliver

Am 09.08.19 um 08:54 schrieb Ansgar Jazdzewski:
We got our OSD's back

Since we removed the EC-Pool (cephfs.data) we had to figure out how to
remove the PG from teh Offline OSD and hier is how we did it.

remove cehfs, remove cache layer, remove pools:
#ceph mds fail 0
#ceph fs rm cephfs --yes-i-really-mean-it
#ceph osd tier remove-overlay cephfs.data
there is now (or already was) no overlay for 'cephfs.data'
#ceph osd tier remove cephfs.data cephfs.cache
pool 'cephfs.cache' is now (or already was) not a tier of 'cephfs.data'
#ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
#ceph osd pool delete cephfs.cache cephfs.cache --yes-i-really-really-mean-it
pool 'cephfs.cache' removed
#ceph osd pool delete cephfs.data cephfs.data --yes-i-really-really-mean-it
pool 'cephfs.data' removed
#ceph osd pool delete cephfs.metadata cephfs.metadata
--yes-i-really-really-mean-it
pool 'cephfs.metadata' removed

remove placement groups of pool 23 (cephfs.data) from all offline OSDs:
DATAPATH=/var/lib/ceph/osd/ceph-${OSD}
a=`ceph-objectstore-tool --data-path ${DATAPATH} --op list-pgs | grep "^23\."`
for i in $a; do
   echo "INFO: removing ${i} from OSD ${OSD}"
   ceph-objectstore-tool --data-path ${DATAPATH} --pgid ${i} --op remove --force
done

since we now had removed our cephfs we still not know if we could have
solved it without data loss by upgrading to nautilus.

Have a nice Weekend,
Ansgar

Am Mi., 7. Aug. 2019 um 17:03 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx>:

another update,

we now took the more destructive route and removed the cephfs pools
(lucky we had only test date in the filesystem)
Our hope was that within the startup-process the osd will delete the
no longer needed PG, But this is NOT the Case.

So we are still have the same issue the only difference is that the PG
does not belong to a pool anymore.

  -360> 2019-08-07 14:52:32.655 7fb14db8de00  5 osd.44 pg_epoch: 196586
pg[23.f8s0(unlocked)] enter Initial
  -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1
/build/ceph-13.2.6/src/osd/ECUtil.h: In function
'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
7fb14db8de00 time 2019-08-07 14:52:32.660169
/build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
stripe_size == 0)

we now can take one rout and try to delete the pg by hand in the OSD
(bluestore) how this can be done? OR we try to upgrade to Nautilus and
hope for the beset.

any help hints are welcome,
have a nice one
Ansgar

Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx>:

Hi,

as a follow-up:
* a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6
* our ec-pool cration in the fist place https://pastebin.com/20cC06Jn
* ceph osd dump and ceph osd erasure-code-profile get cephfs
https://pastebin.com/TRLPaWcH

as we try to dig more into it, it looks like a bug in the cephfs or
erasure-coding part of ceph.

Ansgar

Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski@xxxxxxxxxxxxxx>:

hi folks,

we had to move one of our clusters so we had to boot all servers, now
we found an Error on all OSD with the EC-Pool.

do we miss some opitons, will an upgrade to 13.2.6 help?

Thanks,
Ansgar

2019-08-06 12:10:16.265 7fb337b83200 -1
/build/ceph-13.2.4/src/osd/ECUtil.h: In function
'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
7fb337b83200 time 2019-08-06 12:10:16.263025
/build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
stripe_size == 0)

ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
(stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char
const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3:
(ECBackend::ECBackend(PGBackend::Listener, coll_t const&,
boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t
const&, std::map<std::cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > >, std
::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
std::cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > > const&, PGBackend::Listener, coll_t,
boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
CephContext)+0x2f9 ) [0x9474e9] 5:
(PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>,
PGPool const&, std::map<std::cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >,
std::cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::less<std::cxx11::basic_string<char,
std::char_tra its<char>, std::allocator<char> > >,
std::allocator<std::pair<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const,
std::cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6:
(OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3)
[0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8:
(OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10:
(libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29)
[0x717389] NOTE: a copy of the executable, or objdump -rdS
<executable> is needed to interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com