Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

It is a quite old cluster (hopefully, not the production one), it was created in Luminous if I remember well.

Cordialement, Regards,
Lionel BEARD
CLS - IT & Operations
11 rue Hermès, Parc Technologique du Canal
31520 Ramonville Saint-Agne – France
Tél : +33 (0)5 61 39 39 19

-----Message d'origine-----
De : Igor Fedotov <igor.fedotov@xxxxxxxx>
Envoyé : mardi 26 octobre 2021 00:39
À : Beard Lionel <lbeard@xxxxxxxxxxxx>; ceph-users@xxxxxxx
Objet : Re:  Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

CAUTION: This message comes from an external server, do not click on links or open attachments unless you know the sender and are sure the content is safe.


Hi Beard,

curious if that cluster had been created by pre-Nautilus release, e.g.
Luminous or Kraken?


Thanks,

Igor

On 10/22/2021 3:53 PM, Beard Lionel wrote:
> Hi,
>
> I had exactly the same behaviour:
> - upgrade from nautilus to pacific
> - same warning message
> - set config option
> - restart osd. I've first restarted one osd and it was fine, so I decided to restart all osds of same host, and about half of osds can't start anymore with same error as you.
>
> We didn't find any workaround, apart deleting and recreating failed
> osds ☹
>
> For MDS, which was also crashing, I had to follow the recovery
> procedure to recover my data:
> https://m365.eu.vadesecure.com/safeproxy/v4?f=J8X0lClXijUI9YfoCJGXWge3
> T9ucaF0kHC_ahnqK1E874k_p8WW1vsuOdVLFJdBm&i=8CdSME85CJgZxhjPAuz-rizkkk7
> kD6b6gll_ehWXT2gw01shcJznT7H-btvVKy-CHgWffp1xCXLdNUS2WF8rzA&k=58Pk&r=Q
> 41K3oX_TWpsGN9yU8YbntCWYDNn7b0LEN423z0oyDM0GiuS5vK_3WeZ0Ky5-lLV&s=dad6
> 409f88fb5a19d6529aacd1f805d0cf6abcab576b2d1a99fee37f79793a58&u=https%3
> A%2F%2Fdocs.ceph.com%2Fen%2Flatest%2Fcephfs%2Fdisaster-recovery-expert
> s%2F%23using-an-alternate-metadata-pool-for-recovery
>
> Cordialement, Regards,
> Lionel BEARD
> CLS - IT & Operations
>
> -----Message d'origine-----
> De : Marek Grzybowski <marek.grzybowski@xxxxxxxxx> De la part de
> mgrzybowski Envoyé : mercredi 20 octobre 2021 23:56 À :
> ceph-users@xxxxxxx Objet :  Upgrade to 16.2.6 and osd+mds
> crash after bluestore_fsck_quick_fix_on_mount true
>
> CAUTION: This message comes from an external server, do not click on links or open attachments unless you know the sender and are sure the content is safe.
>
>
> Hi
>     Recently I did perform upgrades on single node cephfs server i have.
>
> # ceph fs ls
> name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
> ecpoolk3m1osd ecpoolk5m1osd ecpoolk4m2osd ~# ceph osd pool ls detail
> pool 20 'cephfs_data' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn
> last_change 10674 lfor 0/0/5088 flags hashpspool stripe_width 0
> application cephfs pool 21 'cephfs_metadata' replicated size 3
> min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32
> autoscale_mode warn last_change 10674 lfor 0/0/5179 flags hashpspool
> stripe_width 0 application cephfs pool 22 'ecpoolk3m1osd' erasure
> profile myprofilek3m1osd size 4 min_size 3 crush_rule 3 object_hash
> rjenkins pg_num 16 pgp_num 16 autoscale_mode warn last_change 10674
> lfor 0/0/1442 flags hashpspool,ec_overwrites stripe_width 12288
> compression_algorithm zstd compression_mode aggressive application
> cephfs pool 23 'ecpoolk5m1osd' erasure profile myprofilek5m1osd size 6
> min_size 5 crush_rule 5 object_hash rjenkins pg_num 128 pgp_num 128
> autoscale_mode warn last_change 12517 lfor 0/0/7892 flags
> hashpspool,ec_overwrites stripe_width 20480 compression_algorithm zstd
> compression_mode aggressive application cephfs pool 24 'ecpoolk4m2osd'
> erasure profile myprofilek4m2osd size 6 min_size 5 crush_rule 6
> object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
> last_change 10674 flags hashpspool,ec_overwrites stripe_width 16384
> compression_algorithm zstd compression_mode aggressive application
> cephfs pool 25 'device_health_metrics' replicated size 3 min_size 2
> crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode
> on last_change 11033 lfor 0/0/10991 flags hashpspool stripe_width 0
> pg_num_min 1 application mgr_devicehealth
>
>
> I started this upgrade from ubuntu 16.04 and luminous ( there were upgrades in the past and some osd's could be started in Kraken ) ):
> - first i upgraded ceph to Nautilus,  all seems to went well and
> accoording to the docs, no warning in status
> - then i did "do-release-upgrade" to ubuntu to 18.04 ( ceph packaged
> were not touch by that upgrade )
> - then i did "do-release-upgrade" to ubuntu to 20.04 ( this upgrade bumped ceph
>     packages to 15.2.1-0ubuntu1, before each do-release-upgrade i removed /etc/ceph/ceph.conf,
>     so at least mon deamon was down. osd should not start ( siple
> volumes are encrypted )
> - next i upgraded ceph packages to  16.2.6-1focal m started deamons .
>
> All seems to work well, only what left was warning:
>
> 10 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats
>
> I found on the list that it is recommend to set:
>
> ceph config set osd bluestore_fsck_quick_fix_on_mount true
>
> and rolling restart OSDs. After first restart+fsck i got crash on OSD ( and on MDS to) :
>
>       -1> 2021-10-14T22:02:45.877+0200 7f7f080a4f00 -1
> /build/ceph-16.2.6/src/osd/PG.cc: In function 'static int
> PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*)' thread 7f7f080a4f00
> time 2021-10-14T22:02:45.878154+0200
> /build/ceph-16.2.6/src/osd/PG.cc: 1009: FAILED ceph_assert(values.size() == 2)
>    ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55e29cd0ce61]
>    2: /usr/bin/ceph-osd(+0xac6069) [0x55e29cd0d069]
>    3: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0xa17) [0x55e29ce97057]
>    4: (OSD::load_pgs()+0x6b4) [0x55e29ce07ec4]
>    5: (OSD::init()+0x2b4e) [0x55e29ce14a6e]
>    6: main()
>    7: __libc_start_main()
>    8: _start()
>
>
> The same went on next restart+fsck  osd:
>
>       -1> 2021-10-17T22:47:49.291+0200 7f98877bff00 -1
> /build/ceph-16.2.6/src/osd/PG.cc: In function 'static int
> PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*)' thread 7f98877bff00
> time 2021-10-17T22:47:49.292912+0200
> /build/ceph-16.2.6/src/osd/PG.cc: 1009: FAILED
> ceph_assert(values.size() == 2)
>
>    ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
>    1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x560e09af7e61]
>    2: /usr/bin/ceph-osd(+0xac6069) [0x560e09af8069]
>    3: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0xa17) [0x560e09c82057]
>    4: (OSD::load_pgs()+0x6b4) [0x560e09bf2ec4]
>    5: (OSD::init()+0x2b4e) [0x560e09bffa6e]
>    6: main()
>    7: __libc_start_main()
>    8: _start()
>
>
> Once crashed OSDs could not be bring back online, they will crash again if i try start them.
> Deep fsck did not found anything:
>
> ~# ceph-bluestore-tool --command fsck  --deep yes --path
> /var/lib/ceph/osd/ceph-2 fsck success
>
>
> Any ideas what could cause this crashes and is it possible to bring online crashed osd this way  ?
>
>
> --
>     mgrzybowski
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx ________________________________
>
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destiné, merci de le détruire ainsi que toute copie de votre système et d'en avertir immédiatement l'expéditeur. Toute lecture non autorisée, toute utilisation de ce message qui n'est pas conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'intégrité de ce message électronique susceptible d'altération, l’expéditeur (et ses filiales) décline(nt) toute responsabilité au titre de ce message dans l'hypothèse où il aurait été modifié ou falsifié.
>
> This message and any attachments (the "message") is intended solely for the intended recipient(s) and is confidential. If you receive this message in error, or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, the sender (and its subsidiaries) shall not be liable for the message if modified or falsified.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://m365.eu.vadesecure.com/safeproxy/v4?f=Lv5YaSKOiQFSMi4InbyfYOjqQ4apknHhjt4vqpNzQXrbbJVzLYBwTkezONGQn8gu&i=UE1OYkSPOxX_oXYzO1CclCFf6IX_VyvrjtOjiqp2T099OFj9BgMTUC5b1AteYNGMZthiqalXtd_Vzq-S4TRYnw&k=jbTP&r=vOXJPlrEnlINk5CguO4sESPFT9ug311qeeTcnhEJlt7AR-DC5-nPW5uOCvGfwcAV&s=5be603e8cac7780708c2336c34e756a1a53937ee2dcb1a4d28973fc3f2c20468&u=https%3A%2F%2Fcroit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263
Web: https://m365.eu.vadesecure.com/safeproxy/v4?f=Lv5YaSKOiQFSMi4InbyfYOjqQ4apknHhjt4vqpNzQXrbbJVzLYBwTkezONGQn8gu&i=UE1OYkSPOxX_oXYzO1CclCFf6IX_VyvrjtOjiqp2T099OFj9BgMTUC5b1AteYNGMZthiqalXtd_Vzq-S4TRYnw&k=jbTP&r=vOXJPlrEnlINk5CguO4sESPFT9ug311qeeTcnhEJlt7AR-DC5-nPW5uOCvGfwcAV&s=5be603e8cac7780708c2336c34e756a1a53937ee2dcb1a4d28973fc3f2c20468&u=https%3A%2F%2Fcroit.io | YouTube: https://m365.eu.vadesecure.com/safeproxy/v4?f=b9UX623LgAsFWLZw06tX065U0JQC_ZAQUjM-aDduDNcyJydt4FkXhZKIsl9O6bq6&i=UmtY_vOyrHCiO23sb7Wja8AEBx0bhA5G87vdurSjWGt2g4Vb9QNobRqjViot3PKFWvjDZp6HypuNEyuFyj_8zA&k=WXCM&r=YaQMD5xKn_QqU6ovXPUjbNpL_4lbaGZf7SD8G9FO1wYbi2ndOa_Fl6rLB6Oxdzh4&s=11a9c08eb0bbd96094417b40a2e93d98eccbda654bed5df9ad244f418cfb79e6&u=https%3A%2F%2Fgoo.gl%2FPGE1Bx


________________________________

Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destiné, merci de le détruire ainsi que toute copie de votre système et d'en avertir immédiatement l'expéditeur. Toute lecture non autorisée, toute utilisation de ce message qui n'est pas conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'intégrité de ce message électronique susceptible d'altération, l’expéditeur (et ses filiales) décline(nt) toute responsabilité au titre de ce message dans l'hypothèse où il aurait été modifié ou falsifié.

This message and any attachments (the "message") is intended solely for the intended recipient(s) and is confidential. If you receive this message in error, or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, the sender (and its subsidiaries) shall not be liable for the message if modified or falsified.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux