Re: OSD crash with assertion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

although changing an existing EC profile (by force) is possible (I haven't tried in Octopus yet) it won't have any effect on existing pools [1]:

Choosing the right profile is important because it cannot be modified after the pool is created: a new pool with a different profile needs to be created and all objects from the previous pool moved to the new.

You can either change the crush_rule for that pool to get a different distribution (but it won't change k and m) or follow Sylvain's description to copy the pool content to a new pool with the desired EC profile.

Regards,
Eugen

[1] https://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-code-profiles


Zitat von Michael Fladischer <michael@xxxxxxxx>:

Hi Sylvain,

Yeah, that's the best and safes way to do it. The pool I wrecked was fortunately a dummy-pool.

The pool for which I want to change to EC profile is ~4PiB large, so moving all files (pool is used in CephFS) on it to a new pool might take some time and I was hoping for an in-place configuration change. But as demonstrated by my own recklessness, this does not work and will take most of the OSD down with it.

Regards,
Michael

Am 22.06.2020 um 21:39 schrieb St-Germain, Sylvain (SSC/SPC):
The way I did is I create a new pool, copy data on it and put the new pool in place of the old one after I delete the former pool

echo "--------------------------------------------------------------------"
echo " Create a new pool with erasure coding"
echo "--------------------------------------------------------------------"
sudo ceph osd pool create $pool.new 64 64 erasure ecprofile-5-3

echo "--------------------------------------------------------------------"
echo " Copy the original pool to the new pool"
echo "--------------------------------------------------------------------"
sudo rados cppool $pool $pool.new

echo "--------------------------------------------------------------------"
echo " Rename the original pool to .old"
echo "--------------------------------------------------------------------"
sudo ceph osd pool rename $pool $pool.old

echo "--------------------------------------------------------------------"
echo " Rename the new erasure coding pool to $pool"
echo "--------------------------------------------------------------------"
sudo ceph osd pool rename $pool.new $pool

echo "--------------------------------------------------------------------"
echo " Set the pool: $pool  to autoscaling"
echo "--------------------------------------------------------------------"
sudo ceph osd pool set $pool pg_autoscale_mode on

echo "--------------------------------------------------------------------"
echo " Show detail off the new create pool"
echo "--------------------------------------------------------------------"
sudo ceph osd pool get $pool all

Sylvain

-----Message d'origine-----
De : Michael Fladischer <michael@xxxxxxxx>
Envoyé : 22 juin 2020 15:23
À : ceph-users@xxxxxxx
Objet :  Re: OSD crash with assertion

Turns out, I really messed up when changing the EC profile. Removing the pool did not get rid of it's PGs on the OSDs that have crashed.

To get my OSDs back up I used ceph-objectstore-tool like this:

for PG in $(ceph-objectstore-tool --data-path $DIR --type=bluestore --op=list-pgs |grep '^$POOL_ID'); do ceph-objectstore-tool --data-path $DIR --type=bluestore --op=remove --force --pgid=$PG done

$DIR is the data path of the crashed OSD.
$POOL_ID is the ID of the pool with the messed up EC profile.

I'm now curious if there is an easier way to do this?

After getting rid of all PGs the OSD were able to start again. Hope this helps someone.

Regards,
Michael


Am 22.06.2020 um 19:46 schrieb Michael Fladischer:
Hi,

a lot of our OSD have crashed a few hours ago because of a failed
assertion:

/build/ceph-15.2.3/src/osd/ECUtil.h: 34: FAILED
ceph_assert(stripe_width % stripe_size == 0)

Full output here:
https://pastebin.com/D1SXzKsK

All OSDs are on bluestore and run 15.2.3.

I think I messed up when I tried to change an existing EC profile
(using
--force) for an active EC pool.

I already tried to delete the pool and the EC profile and start the
OSDs but they keep crashing with the same assertion.

Is there a way to at least find out what the values are for
stripe_width and stripe_size?

Regards,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux