Fwd: Ceph Upgrade Issue - Luminous to Nautilus (14.2.11 ) using ceph-ansible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We encountered an issue while upgrading our Ceph cluster from Luminous
12.2.12 to Nautilus 14.2.11.   We used
https://docs.ceph.com/docs/master/releases/nautilus/#upgrading-from-mimic-or-luminous
and ceph-ansible to upgrade the cluster.  We use HDD for data and NVME for
WAL and DB.

*Cluster Background:*
HP DL360
24 x 3.6T SATA
2x1.6T NVME for Journal
osd_scenario: non-collocated
current version: Luminous  12.2.12  & 12.2.5
type: bluestore

The upgrade went well for MONs (though I had to overcome the systemd
masking issues).  While testing OSD upgrade with one OSD node, we
encountered issue with OSD daemon failing quickly after startup. After
comparing and checking the block devices mapping, everything looks fine.
The nodes was up for more than 700+ days and then I decided to do a clean
reboot.  After that noticed the mount points are completely missing and
also ceph-disk is no longer part of nautilus. Had to manually mount the
partitions after checking disk partitions  and whoami information.  After
manually mounting the osd.108, now it's throwing permission error which I'm
still reviewing (bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Permission denied).  Enclosed the log of the OSD for full review
- https://pastebin.com/7k0xBfDV.

*Questions*:
What could have went wrong here and how can we fix this?
Do we need to migrate the Luminous cluster from ceph-disk to ceph-volume
before attempting upgrade or any other best practice can be followed?
What's the best upgrade method using ceph-ansible to move from Luminous to
Nautilus?  Manual upgrade of Ceph-ansible?

Started thinking now Octopus release which uses container, what is the best
transition path for long run?  We don't want to destroy and rebuild the
entire cluster but we can do one node at a time but that would be a very
lengthy process for 2500+ systems of 13 clusters.  Looking for help and
expert comments on the transition path.

Any help would be greatly appreciated.

2020-08-27 14:41:01.132 7f0e0ebf2c00  0 bdev(0xb7e2a80
/var/lib/ceph/osd/ceph-108/block.wal) ioctl(F_SET_FILE_RW_HINT) on
/var/lib/ceph/osd/ceph-108/block.wal failed: (22) Invalid argument
2020-08-27 14:41:01.132 7f0e0ebf2c00  1 bdev(0xb7e2a80
/var/lib/ceph/osd/ceph-108/block.wal) open size 1073741824 (0x40000000, 1
GiB) block_size 4096 (4 KiB) non-rotational discard supported
2020-08-27 14:41:01.132 7f0e0ebf2c00  1 bluefs add_block_device bdev 0 path
/var/lib/ceph/osd/ceph-108/block.wal size 1 GiB
2020-08-27 14:41:01.132 7f0e0ebf2c00  0  set rocksdb option
compaction_style = kCompactionStyleLevel
2020-08-27 14:41:01.132 7f0e0ebf2c00 -1 rocksdb: Invalid argument: Can't
parse option compaction_threads
2020-08-27 14:41:01.136 7f0e0ebf2c00 -1
/build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: In function 'int
BlueStore::_upgrade_super()' thread 7f0e0ebf2c00 time 2020-08-27
14:41:01.135973
/build/ceph-14.2.11/src/os/bluestore/BlueStore.cc: 10249: FAILED
ceph_assert(ondisk_format > 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x152) [0x846368]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*,
char const*, ...)+0) [0x846543]
 3: (BlueStore::_upgrade_super()+0x4b6) [0xd62346]
 4: (BlueStore::_mount(bool, bool)+0x592) [0xdb0b52]
 5: (OSD::init()+0x3f3) [0x8f5483]
 6: (main()+0x27e2) [0x84c462]
 7: (__libc_start_main()+0xf0) [0x7f0e0bda3830]
 8: (_start()+0x29) [0x880389]

Journalctl -xu log
================
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.309
7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Perm
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Perm
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bluestore(/var/lib/ceph/osd/ceph-108/block)
_read_bdev_label failed to op
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 bdev(0xd1be000 /var/lib/ceph/osd/ceph-108/block) open open
got: (13) Perm
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1 osd.108 0 OSD:init: unable to mount object store
Aug 27 20:18:39 pistore-as-b03 ceph-osd[345903]: 2020-08-27 20:18:39.317
7fe9410bfc00 -1  ** ERROR: osd init failed: (13) Permission denied
Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Main
process exited, code=exited, status=1/FAILURE
Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Unit
entered failed state.
Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Failed
with result 'exit-code'.
Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Service
hold-off time over, scheduling restart.
Aug 27 20:18:39 pistore-as-b03 systemd[1]: Stopped Ceph object storage
daemon osd.108.
Aug 27 20:18:39 pistore-as-b03 systemd[1]: ceph-osd@108.service: Start
request repeated too quickly.
Aug 27 20:18:39 pistore-as-b03 systemd[1]: Failed to start Ceph object
storage daemon osd.108.
Aug 27 20:18:40 pistore-as-b03 systemd[1]:
[/lib/systemd/system/ceph-osd@.service:15]
Unknown lvalue 'LockPersonality' in section 'Service'
Aug 27 20:18:40 pistore-as-b03 systemd[1]:
[/lib/systemd/system/ceph-osd@.service:16]
Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
Aug 27 20:18:40 pistore-as-b03 systemd[1]:
[/lib/systemd/system/ceph-osd@.service:19]
Unknown lvalue 'ProtectControlGroups' in section 'Service'
Aug 27 20:18:40 pistore-as-b03 systemd[1]:
[/lib/systemd/system/ceph-osd@.service:21]
Unknown lvalue 'ProtectKernelModules' in section 'Service'
Aug 27 20:18:40 pistore-as-b03 systemd[1]:
[/lib/systemd/system/ceph-osd@.service:23]
Unknown lvalue 'ProtectKernelTunables' in section 'Service'

-- 
Regards,
Suresh

Note: Resent as the size was big and waiting on moderator approval
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux