We're running journals on NVMe as well - SLES
before rebooting try deleting the links here:
/etc/systemd/system/ceph-osd.target.wants/
if we delete first it boots ok
if we don't delete the disks sometimes don't come up and we have to ceph-disk activate all
HTH Thanks Joe
>>> David Turner <drakonstein@xxxxxxxxx> 9/15/2017 9:54 AM >>> I have this issue with my NVMe OSDs, but not my HDD OSDs. I have 15 HDD's and 2 NVMe's in each host. We put most of the journals on one of the NVMe's and a few on the second, but added a small OSD partition to the second NVMe for RGW metadata pools.
When restarting a server manually for testing, the NVMe OSD comes back up normally. We're tracking a problem with the OSD nodes freezing and having to force reboot them. After this, the NVMe OSD doesn't come back on its own until I run `ceph-disk activate-all`. This seems to track with your theory that a non-clean FS is a part of the equation.
Is there any ideas as to how to resolve this yet? So far being able to run `ceph-disk activate-all` is good enough, but a bit of a nuisance. On Fri, Sep 15, 2017 at 11:48 AM Matthew Vernon <mv3@xxxxxxxxxxxx> wrote:
Hi, |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com