Re: Forced upgrade OSD from Luminous to Pacific

Alex Rydzewski <rydzewski.al@xxxxxxxxx> · Wed, 9 Oct 2024 13:26:56 +0300

Hello, Frédéric!

1.
First I repaired mon when ceph was Luminous but it wouldn't start with 
some error I don't remember. Then I upgrade ceph and repeat repair 
procedure and I then upgrade ceph and repeated the restore procedure and 
mon started. Now I can query to it.
root@helper:~# ceph --version
ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific 
(stable)

root@helper:~# ceph -s
  cluster:
    id:     96b6ff1d-25bf-403f-be3d-78c2fb0ff747
    health: HEALTH_WARN
            mon is allowing insecure global_id reclaim
            2 osds down
            Reduced data availability: 351 pgs inactive
            2 pool(s) have non-power-of-two pg_num
            2 daemons have recently crashed

  services:
    mon: 1 daemons, quorum helper (age 21h)
    mgr: helper(active, since 21h)
    osd: 5 osds: 1 up, 3 in

  data:
    pools:   3 pools, 351 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             351 unknown

root@helper:~# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         18.19298  root default
-3         18.19298      host helper
 0    hdd   3.63860          osd.0      down         0  1.00000
 1    hdd   3.63860          osd.1        up   1.00000  1.00000
 2    hdd   3.63860          osd.2      down         0  1.00000
 3    hdd   3.63860          osd.3      down   1.00000  1.00000
 4    hdd   3.63860          osd.4      down   1.00000  1.00000

Although it has this state, there are actually no OSDs connected to it

root@helper:~# tail /var/log/ceph/ceph-osd.1.log
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 
0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 
MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 
level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for 
pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 
memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [P] **

2024-10-09T13:17:09.716+0300 7eff91d8a700  1 osd.1 45887 tick checking 
mon for new map
2024-10-09T13:17:39.864+0300 7eff91d8a700  1 osd.1 45887 tick checking 
mon for new map

2. Yes, I upgraded MON and OSDs to Pacific

root@helper:~# ceph-osd --version
ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific 
(stable)
root@helper:~# ceph-mon --version
ceph version 16.2.15 (12fd9dfef6998ac41c93f56885264a7d43a51b03) pacific 
(stable)

3.
Yes, now MON started and OSDs started, but they cannot connect to MON. 
At the same time, the MON journal has a message:
/disallowing boot of octopus+ OSD osd.xx/
/
/
And I tried rebuild the MON with this ceph (Pacific) version and it is 
running now

On 09.10.24 12:35, Frédéric Nass wrote:
----- Le 8 Oct 24, à 15:24, Alex Rydzewskirydzewski.al@xxxxxxxxx  a écrit :

Hello, dear community!

I kindly ask for your help in resolving my issue.

I have a server with a single-node CEPH setup with 5 OSDs. This server
has been powered off for about two years, and when I needed the data
from it, I found that the SSD where the system was installed had died.

I tried to recover the cluster. First, assuming the old CEPH is there I
installed Debian 10 with CEPH 12.2.11, mounted the OSDs to
/var/lib/ceph/osd/ceph-xx and assembled the monitor, as described here
https://forum.proxmox.com/threads/recover-ceph-from-osds-only.113699/.

However, the monitor wouldn't start, giving an error I don't remember.
Then I made a series of mistakes, upgrading the system and CEPH first to
Nautilus and then to Pacific. Eventually, I managed to start the
monitor, but a compatibility issue with the OSDs remains.

When the OSDs start, I see the message: /check_osdmap_features
require_osd_release unknown -> luminous
/At the same time, the monitor log shows: /disallowing boot of octopus+
OSD osd.xx.
/After starting, the OSD remains in the state: /tick checking mon for
new map/
Hi Alex,

Correct me if I got this wrong:

1. You repaired the MON database while OSDs were still on Luminous
2. You upgraded MONs and OSDs to Pacific
3. MONs now start but won't allow Pacific OSDs to join the cluster

Have you tried repairing the MON database again, now that the OSDs are running Pacific? (Make sure to back up the previously repaired MON database before attempting this.)

Regards,
Frédéric

Then I enabledmsgrv2 protocol and tried enabling RocksDB sharding for
the OSD, as described here
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#bluestore-rocksdb-sharding,
but it didn’t help.

Attempts to start the OSD with lower versions of CEPH event with Octopus
end with the error:
/2024-10-08 10:45:38.402975 7fba61b34ec0 -1 bluefs _replay 0x0: stop:
unrecognized op 12
2024-10-08 10:45:38.402992 7fba61b34ec0 -1 bluefs mount failed to replay
log: (5) Input/output error/

So, currently, I have CEPH 16.2.15, and the OSD is in the following state:

/"/var/lib/ceph/osd/ceph-1/block": {
     "osd_uuid": "2bb56721-28c7-45cc-9344-6cc5c699a642",
     "size": 4000681103360,
     "btime": "2018-06-02 13:16:57.042205",
     "description": "main",
     "bfm_blocks": "61045632",
     "bfm_blocks_per_key": "128",
     "bfm_bytes_per_block": "65536",
     "bfm_size": "4000681099264",
     "bluefs": "1",
     "ceph_fsid": "96b6ff1d-25bf-403f-be3d-78c2fb0ff747",
     "kv_backend": "rocksdb",
     "magic": "ceph osd volume v026",
     "mkfs_done": "yes",
     "ready": "ready",
     "require_osd_release": "12",
     "whoami": "1"
}/

with modified RocksDB to enable sharding.

Suggest me, please, Is there a way to upgrade such OSDs so they can run
with this version of Ceph?

If you need more information here, let me know and I will provide
whatever is needed.

--
Alexander Rydzewski
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

--
Олександр Ридзевський
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx