Hi All, We encountered an issue while upgrading our Ceph cluster from Luminous 12.2.12 to Nautilus 14.2.11. We used https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous and ceph-ansible to upgrade the cluster. We use HDD for data and NVME for WAL and DB. *Cluster Background:* HP DL360 24 x 3.6T SATA 2x1.6T NVME for Journal osd_scenario: non-collocated current version: Luminous 12.2.12 & 12.2.5 type: bluestore The upgrade went well for MONs (though I had to overcome the systemd masking issues). While testing OSD upgrade with one OSD node, we encountered issue with OSD daemon failing quickly after startup. After comparing and checking the block devices mapping, everything looks fine. The nodes was up for more than 700+ days and then I decided to do a clean reboot. After that noticed the mount points are completely missing and also ceph-disk is no longer part of nautilus. Had to manually mount the partitions after checking disk partitions and whoami information. After manually mounting the osd.108, now it's throwing permission error which I'm still reviewing (bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open got: (13) Permission denied). Enclosed the log of the OSD for full review - https://pastebin.com/7k0xBfDV. *Questions*: What could have went wrong here and how can we fix this? Do we need to migrate the Luminous cluster from ceph-disk to ceph-volume before attempting upgrade or any other best practice can be followed? What's the best upgrade method using ceph-ansible to move from Luminous to Nautilus? Manual upgrade of Ceph-ansible? Started thinking now Octopus release which uses container, what is the best transition path for long run? We don't want to destroy and rebuild the entire cluster but we can do one node at a time but that would be a very lengthy process for 2500+ systems of 13 clusters. Looking for help and expert comments on the transition path. Any help would be greatly appreciated. 2020-08-27 14:41:01.132 7f0e0ebf2c00 0 bdev(0xb7e2a80 /var/lib/ceph/osd/ceph-108/block.wal) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-108/block.wal failed: (22) Invalid argument 2020-08-27 14:41:01.132 7f0e0ebf2c00 1 bdev(0xb7e2a80 /var/lib/ceph/osd/ceph-108/block.wal) open size 1073741824 (0x40000000, 1 GiB) block_size 4096 (4 KiB) non-rotational discard supported 2020-08-27 14:41:01.132 7f0e0ebf2c00 1 bluefs add_block_device bdev 0 path /var/lib/ceph/osd/ceph-108/block.wal size 1 GiB 2020-08-27 14:41:01.132 7f0e0ebf2c00 0 set rocksdb option compaction_style = kCompactionStyleLevel 2020-08-27 14:41:01.132 7f0e0ebf2c00 -1 rocksdb: Invalid argument: Can't parse option compaction_threads 2020-08-27 14:41:01.136 7f0e0ebf2c00 -1 /build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7f0e0ebf2c00 time 2020-08-27 14:41:01.135973 /build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: 10249: FAILED ceph_assert(ondisk_format > 0) ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x846368] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x846543] 3: (BlueStore::_upgrade_super()+0x4b6) [0xd62346] 4: (BlueStore::_mount(bool, bool)+0x592) [0xdb0b52] 5: (OSD::init()+0x3f3) [0x8f5483] 6: (main()+0x27e2) [0x84c462] 7: (__libc_start_main()+0xf0) [0x7f0e0bda3830] 8: (_start()+0x29) [0x880389] Journalctl -xu log ================ Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309 7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open got: (13) Perm Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open got: (13) Perm Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block) _read_bdev_label failed to op Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open got: (13) Perm Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 osd.108 0 OSD:init: unable to mount object store Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317 7fe9410bfc00 -1 ** ERROR: osd init failed: (13) Permission denied Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Main process exited, code=exited, status=1/FAILURE Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Unit entered failed state. Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Failed with result 'exit-code'. Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Service hold-off time over, scheduling restart. Aug 27 20:18:39 pistore-as-b03 systemd[1]: Stopped Ceph object storage daemon osd.108. Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Start request repeated too quickly. Aug 27 20:18:39 pistore-as-b03 systemd[1]: Failed to start Ceph object storage daemon osd.108. Aug 27 20:18:40 pistore-as-b03 systemd[1]: [/lib/systemd/system/ceph-osd@.service:15] Unknown lvalue 'LockPersonality' in section 'Service' Aug 27 20:18:40 pistore-as-b03 systemd[1]: [/lib/systemd/system/ceph-osd@.service:16] Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service' Aug 27 20:18:40 pistore-as-b03 systemd[1]: [/lib/systemd/system/ceph-osd@.service:19] Unknown lvalue 'ProtectControlGroups' in section 'Service' Aug 27 20:18:40 pistore-as-b03 systemd[1]: [/lib/systemd/system/ceph-osd@.service:21] Unknown lvalue 'ProtectKernelModules' in section 'Service' Aug 27 20:18:40 pistore-as-b03 systemd[1]: [/lib/systemd/system/ceph-osd@.service:23] Unknown lvalue 'ProtectKernelTunables' in section 'Service' -- Regards, Suresh Note: Resent as the size was big and waiting on moderator approval _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx