Re: Upgrade from Octopus to Quiny fails on third ceph-mon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry, I was busy and couldn't reply. But it's great that the issue is resolved!

Zitat von "Ackermann, Christoph" <c.ackermann@xxxxxxxxxxxx>:

Eugen,

i had to restart all five monitors from scratch, so now i have a valid
quorum 5/5 and all mons are v17.2.3.

This was really strange. The main problem seemingly was:

mon.ceph1n011@0(electing) e46 handle_auth_request failed to assign global_id

and so forth...

The one Mon with levelb i did reinstall with correct kv_backend

Thanks for Helping Hand,

Christoph




Am Mi., 28. Sept. 2022 um 13:13 Uhr schrieb Ackermann, Christoph <
c.ackermann@xxxxxxxxxxxx>:

Eugen,

thank you very much. This (ceph1n021) is indeed the one using leveldb (in
kv_backend)

Other mons have kv_backend  "rocksdb" but unfortunately after reinstalling
ceph-mon@ceph1n021 we get no ceph status anymore and our mon logs get
filed with:

2022-09-28T13:10:54.822+0200 7fbc6a863700  0 log_channel(cluster) log
[INF] : mon.ceph1n011 calling monitor election
2022-09-28T13:10:54.824+0200 7fbc6a863700  1
paxos.0).electionLogic(789785) init, last seen epoch 789785, mid-election,
bumping
2022-09-28T13:10:54.827+0200 7fbc6a863700  1 mon.ceph1n011@0(electing)
e46 collect_metadata sda:  no unique device id for sda: fallback method has
model 'Virtual disk    ' but no serial'


2022-09-28T07:11:37.380-0400 7efc2d6ec700  1 mon.ceph1n020@2(electing)
e46 collect_metadata md125:  no unique device id for md125: fallback method
has no model nor serial
2022-09-28T07:11:37.384-0400 7efc2d6ec700  1 mon.ceph1n020@2(electing)
e46 collect_metadata md125:  no unique device id for md125: fallback method
has no model nor serial
2022-09-28T07:11:37.389-0400 7efc2d6ec700  1 mon.ceph1n020@2(electing)
e46 collect_metadata md125:  no unique device id for md125: fallback method
has no model nor serial
2022-09-28T07:11:37.393-0400 7efc2d6ec700  1 mon.ceph1n020@2(electing)
e46 collect_metadata md125:  no unique device id for md125: fallback method
has no model nor serial
[root@ceph1n020 ~]#

It seems there is a problem with the /var device or some ...

Now we have an urgent state of our produktion cluster. :-(

Do you have some hints for us?

Best regards,
Christoph




Am Mi., 28. Sept. 2022 um 12:21 Uhr schrieb Eugen Block <eblock@xxxxxx>:

Hi,

there was a thread about deprecating leveldb [1], but I didn't get the
impression that it already has been deprecated. But the thread
mentions that it's not tested anymore, so that might explain it. To
confirm that you use leveldb you can run:

cat /var/lib/ceph/mon/ceph-<MON>/kv_backend

So you already have successfully upgraded other MONs, what kv_backend
do they use? If this is the last one with leveldb you can probably
move the old store content and recreate an empty MON.

[1]

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/K4OSAA4AJS2V7FQI6GNCKCK3IRQDBQRS/

Zitat von "Ackermann, Christoph" <c.ackermann@xxxxxxxxxxxx>:

> Hello List,
>
> i'm on the way to upgrade our "non cephadm" from Octopus to Quiny. It
> fails/stuck on third ceph-mon ceph1n021  with an strange error:
>
> 2022-09-28T11:04:27.691+0200 7f8681543880 -1 _open error initializing
> leveldb db back storage in /var/lib/ceph/mon/ceph-ceph1n021/store.db
>
> This monitor contains a lot of ldb files, wondering if we do not use
> leveldb anymore..
>
> [root@ceph1n021 ~]# ls  /var/lib/ceph/mon/ceph-ceph1n021/store.db/
> 3251327.ldb  5720254.ldb  6568574.ldb  6652800.ldb  6726397.ldb
>  6726468.ldb  6726623.ldb  6726631.ldb  6726638.ldb  6726646.ldb
>  6726653.ldb  IDENTITY
> 3251520.ldb  6497196.ldb  6575398.ldb  6654280.ldb  6726398.ldb
>  6726469.ldb  6726624.ldb  6726632.ldb  6726639.ldb  6726647.ldb
>  6726654.ldb  LOCK
> 3251566.ldb  6517010.ldb  6595757.ldb  6680000.ldb  6726399.ldb
>  6726588.ldb  6726627.ldb  6726634.ldb  6726642.ldb  6726648.ldb
>  6726655.ldb  MANIFEST-5682438
> 3251572.ldb  6523701.ldb  6601653.ldb  6699521.ldb  6726400.ldb
>  6726608.ldb  6726628.ldb  6726635.ldb  6726643.ldb  6726649.ldb
>  6726656.ldb  OPTIONS-000005
> 3251583.ldb  6543819.ldb  6624261.ldb  6706116.ldb  6726401.ldb
>  6726618.log  6726629.ldb  6726636.ldb  6726644.ldb  6726650.ldb
>  6726657.ldb
> 3251584.ldb  6549696.ldb  6627961.ldb  6725307.ldb  6726467.ldb
>  6726622.ldb  6726630.ldb  6726637.ldb  6726645.ldb  6726651.ldb
CURRENT
>
> All other ceph-mon "store.db" folder consists only expected files like:
>
> [root@ceph1n020 ~]# ls -l  /var/lib/ceph/mon/ceph-ceph1n020/store.db/
> total 153252
> -rw-------. 1 ceph ceph 11230512 Sep 28 05:13 1040392.log
> -rw-------. 1 ceph ceph 67281589 Sep 28 05:11 1040394.sst
> -rw-------. 1 ceph ceph 40121324 Sep 28 05:11 1040395.sst
> -rw-------. 1 ceph ceph       16 Aug 19 06:29 CURRENT
> -rw-------. 1 ceph ceph       37 Feb 21  2022 IDENTITY
> -rw-r--r--. 1 ceph ceph        0 Feb 21  2022 LOCK
> -rw-------. 1 ceph ceph  8465618 Sep 28 05:11 MANIFEST-898389
> -rw-------. 1 ceph ceph     4946 Aug 19 04:51 OPTIONS-898078
> -rw-------. 1 ceph ceph     4946 Aug 19 06:29 OPTIONS-898392
>
>
>     "mon": {
>         "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4)
> octopus (stable)": 3,
>         "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
> quincy (stable)":        (ceph-mon@ceph1n011 and ceph-mon@ceph1n012)
>
> Is it safe to go forward restarting the rest of these monitors
(ceph1n019
> and ceph1n020)  and what can we do to fix errors on ceph-mon@ceph1n021
?
>
> Best regards,
> Christoph
>
>
>
> Christoph Ackermann | System Engineer
> INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken
> Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33 |
c.ackermann@xxxxxxxxxxxx
> | www.infoserve.de
> INFOSERVE Datenschutzhinweise: www.infoserve.de/datenschutz
> Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort:
> Saarbrücken
> Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599
>
> <https://facebook.com/infoserve.de>
> <https://www.xing.com/companies/infoservegmbh>
> <https://www.youtube.com/channel/UCUj8C3TGGhQZPVvxu4woXmQ>
> <https://www.linkedin.com/company-beta/10095540>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx





_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux