Den ons 22 jan. 2020 kl 18:01 skrev Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>:
After upgrading to Nautilus 14.2.6 from Luminous 12.2.12 we are seeing the following behavior on OSDs which were created with "ceph-volume lvm create --filestore --osd-id <osd> --data <device> --journal <journal>"Upon restart of the server containing these OSDs they fail to start with the following error in the logs:2020-01-21 13:36:11.635 7fee633e8a80 -1 filestore(/var/lib/ceph/osd/ceph-199) mount(1928): failed to open journal /var/lib/ceph/osd/ceph-199/journal: (13) Permission denied/var/lib/ceph/osd/ceph-199/journal symlinks to /dev/sdc5 in our case and inspecting the ownership on /dev/sdc5 it is root:root, chowning that to ceph:ceph causes the osd to start and come back up and in near instantly.As a note these OSDs we experience this with are OSDs which have previously failed and been replaced using the above ceph-volume, longer running OSDs in the same server created with ceph-disk or ceph-volume simple (that have a corresponding .json in /etc/ceph/osd) start up fine and get ceph:ceph on their journal partition. Bluestore OSDs also do not have any issue.My hope is that I can preemptively fix these OSDs before shutting them down so that reboots happen seamlessly. Thanks for any insight.
Our workaround (not on Nautilus but still) is to add to the pre-run systemd unit file pointed out like this:
more /usr/lib/systemd/system/ceph-osd\@.service
...
ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh <more args here>
then in that file, after it figures out what your journal should be (even if it is a symlink), do a chown to ceph:ceph
more /usr/lib/ceph/ceph-osd-prestart.sh
...
journal="$data/journal"
chown --dereference ceph:ceph $journal
so it has the correct perms before the filestore OSD gets started.
May the most significant bit of your life be positive.
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx