Re: Forced upgrade OSD from Luminous to Pacific

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex, 

First thing that comes to mind when seeing these logs suggesting a version incompatibility is that you may have forgotten to run some commands (usually mentioned in the release notes) after each major version upgrade, such as setting flags (like sortbitwise, recovery_deletes, purged_snapdirs, pglog_hardlimit) or setting required release after each upgrade (require_osd_release, require_min_compat_client). 

Can you post the result of 'ceph osd dump | head -13'? 

Maybe running a 'ceph osd require-osd-release pacific' would help here. 

Regards, 
Frédéric. 

----- Le 9 Oct 24, à 12:26, Alex Rydzewski <rydzewski.al@xxxxxxxxx> a écrit : 

> Hello, Frédéric!

> 1.
> First I repaired mon when ceph was Luminous but it wouldn't start with some
> error I don't remember. Then I upgrade ceph and repeat repair procedure and I
> then upgrade ceph and repeated the restore procedure and mon started. Now I can
> query to it.
> root@helper:~# ceph --version
> ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific (stable)

> root@helper:~# ceph -s
> cluster:
> id: 96b6ff1d-25bf-403f-be3d-78c2fb0ff747
> health: HEALTH_WARN
> mon is allowing insecure global_id reclaim
> 2 osds down
> Reduced data availability: 351 pgs inactive
> 2 pool(s) have non-power-of-two pg_num
> 2 daemons have recently crashed

> services:
> mon: 1 daemons, quorum helper (age 21h)
> mgr: helper(active, since 21h)
> osd: 5 osds: 1 up, 3 in

> data:
> pools: 3 pools, 351 pgs
> objects: 0 objects, 0 B
> usage: 0 B used, 0 B / 0 B avail
> pgs: 100.000% pgs unknown
> 351 unknown

> root@helper:~# ceph osd tree
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 18.19298 root default
> -3 18.19298 host helper
> 0 hdd 3.63860 osd.0 down 0 1.00000
> 1 hdd 3.63860 osd.1 up 1.00000 1.00000
> 2 hdd 3.63860 osd.2 down 0 1.00000
> 3 hdd 3.63860 osd.3 down 1.00000 1.00000
> 4 hdd 3.63860 osd.4 down 1.00000 1.00000

> Although it has this state, there are actually no OSDs connected to it

> root@helper:~# tail /var/log/ceph/ceph-osd.1.log
> AddFile(L0 Files): cumulative 0, interval 0
> AddFile(Keys): cumulative 0, interval 0
> Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s
> read, 0.0 seconds
> Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s
> read, 0.0 seconds
> Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0
> level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for
> pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0
> memtable_compaction, 0 memtable_slowdown, interval 0 total count

> ** File Read Latency Histogram By Level [P] **

> 2024-10-09T13:17:09.716+0300 7eff91d8a700 1 osd.1 45887 tick checking mon for
> new map
> 2024-10-09T13:17:39.864+0300 7eff91d8a700 1 osd.1 45887 tick checking mon for
> new map

> 2. Yes, I upgraded MON and OSDs to Pacific

> root@helper:~# ceph-osd --version
> ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific (stable)
> root@helper:~# ceph-mon --version
> ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific (stable)

> 3.
> Yes, now MON started and OSDs started, but they cannot connect to MON. At the
> same time, the MON journal has a message:
> disallowing boot of octopus+ OSD osd.xx

> And I tried rebuild the MON with this ceph (Pacific) version and it is running
> now

> On 09.10.24 12:35, Frédéric Nass wrote:

>> ----- Le 8 Oct 24, à 15:24, Alex Rydzewski [ mailto:rydzewski.al@xxxxxxxxx |
>> rydzewski.al@xxxxxxxxx ] a écrit :

>>> Hello, dear community!

>>> I kindly ask for your help in resolving my issue.

>>> I have a server with a single-node CEPH setup with 5 OSDs. This server
>>> has been powered off for about two years, and when I needed the data
>>> from it, I found that the SSD where the system was installed had died.

>>> I tried to recover the cluster. First, assuming the old CEPH is there I
>>> installed Debian 10 with CEPH 12.2.11, mounted the OSDs to
>>> /var/lib/ceph/osd/ceph-xx and assembled the monitor, as described here [
>>> https://forum.proxmox.com/threads/recover-ceph-from-osds-only.113699/ |
>>> https://forum.proxmox.com/threads/recover-ceph-from-osds-only.113699/ ] .

>>> However, the monitor wouldn't start, giving an error I don't remember.
>>> Then I made a series of mistakes, upgrading the system and CEPH first to
>>> Nautilus and then to Pacific. Eventually, I managed to start the
>>> monitor, but a compatibility issue with the OSDs remains.

>>> When the OSDs start, I see the message: /check_osdmap_features
>>> require_osd_release unknown -> luminous
>>> /At the same time, the monitor log shows: /disallowing boot of octopus+
>>> OSD osd.xx.
>>> /After starting, the OSD remains in the state: /tick checking mon for
>>> new map/

>> Hi Alex,

>> Correct me if I got this wrong:

>> 1. You repaired the MON database while OSDs were still on Luminous
>> 2. You upgraded MONs and OSDs to Pacific
>> 3. MONs now start but won't allow Pacific OSDs to join the cluster

>> Have you tried repairing the MON database again, now that the OSDs are running
>> Pacific? (Make sure to back up the previously repaired MON database before
>> attempting this.)

>> Regards,
>> Frédéric

>>> Then I enabledmsgrv2 protocol and tried enabling RocksDB sharding for
>>> the OSD, as described here [
>>> https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#bluestore-rocksdb-sharding
>>> |
>>> https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#bluestore-rocksdb-sharding
>>> ] ,
>>> but it didn’t help.

>>> Attempts to start the OSD with lower versions of CEPH event with Octopus
>>> end with the error:
>>> /2024-10-08 10:45:38.402975 7fba61b34ec0 -1 bluefs _replay 0x0: stop:
>>> unrecognized op 12
>>> 2024-10-08 10:45:38.402992 7fba61b34ec0 -1 bluefs mount failed to replay
>>> log: (5) Input/output error/

>>> So, currently, I have CEPH 16.2.15, and the OSD is in the following state:

>>> /"/var/lib/ceph/osd/ceph-1/block": {
>>>     "osd_uuid": "2bb56721-28c7-45cc-9344-6cc5c699a642",
>>>     "size": 4000681103360,
>>>     "btime": "2018-06-02 13:16:57.042205",
>>>     "description": "main",
>>>     "bfm_blocks": "61045632",
>>>     "bfm_blocks_per_key": "128",
>>>     "bfm_bytes_per_block": "65536",
>>>     "bfm_size": "4000681099264",
>>>     "bluefs": "1",
>>>     "ceph_fsid": "96b6ff1d-25bf-403f-be3d-78c2fb0ff747",
>>>     "kv_backend": "rocksdb",
>>>     "magic": "ceph osd volume v026",
>>>     "mkfs_done": "yes",
>>>     "ready": "ready",
>>>     "require_osd_release": "12",
>>>     "whoami": "1"
>>> }/

>>> with modified RocksDB to enable sharding.

>>> Suggest me, please, Is there a way to upgrade such OSDs so they can run
>>> with this version of Ceph?

>>> If you need more information here, let me know and I will provide
>>> whatever is needed.

>>> --
>>> Alexander Rydzewski
>>> _______________________________________________
>>> ceph-users mailing list -- [ mailto:ceph-users@xxxxxxx | ceph-users@xxxxxxx ] To
>>> unsubscribe send an email to [ mailto:ceph-users-leave@xxxxxxx |
>>> ceph-users-leave@xxxxxxx ]

> --
> Олександр Ридзевський
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux