Dear Benjeman, dear all, indeed, after waiting a bit longer and an mgr restart, it now works (for the single case where I temporarily had SELinux off)! So at least we now know the remaining issues with health metrics :-). Cheers, Oliver Am 07.11.19 um 18:51 schrieb Oliver Freyermuth:
Dear Benjeman, thanks! Indeed, it seems I have to do something similar to that to get: ceph daemon osd.14 smart <device-id> to work. For some reason, "ceph device get-health-metrics" and friends still get stuck for me, but maybe that would just need more time. Now I have to ponder whether to really apply this broad SELinux policy to all file servers or rather wait for an SELinux expert to provide something more granular... Thanks for linking the bug report, I have subscribed to it immediately :-). Cheers and thanks, Oliver Am 04.11.19 um 14:37 schrieb Benjeman Meekhof:Hi Oliver, The ceph-osd RPM packages include a config in /etc/sudoers.d/ceph-osd-smartctl that looks something like this: ceph ALL=NOPASSWD: /usr/sbin/smartctl -a --json /dev/* ceph ALL=NOPASSWD: /usr/sbin/nvme * smart-log-add --json /dev/* If you are using SElinux you will have to adjust capabilities there as well. I think we did something kind of similar to what is attached to this tracker issue: https://tracker.ceph.com/issues/40683 That seemed to get us as far as hosts being able to report disk health to the module. thanks, Ben On Sat, Nov 2, 2019 at 11:38 PM Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:Dear Cephers, I went through some of the OSD logs of our 14.2.4 nodes and found this: ---------------------------------- Nov 01 01:22:25 sudo[1087697]: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a --json /dev/sds Nov 01 01:22:51 sudo[1087729]: pam_unix(sudo:auth): conversation failed Nov 01 01:22:51 sudo[1087729]: pam_unix(sudo:auth): auth could not identify password for [ceph] Nov 01 01:22:51 sudo[1087729]: pam_succeed_if(sudo:auth): requirement "uid >= 1000" not met by user "ceph" Nov 01 01:22:53 sudo[1087729]: ceph : command not allowed ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=nvme lvm smart-log-add --json /dev/sds ---------------------------------- It seems with device health metrics, the OSDs try to run smartctl with "sudo", which expectedly fails, since the Ceph user (as system user) has a uid smaller than 1000. Also, it's of course not in /etc/sudoers. Does somebody have a working setup with device health metrics which could be shared (and documented, or made part of future packaging ;-) ) ? Cheers, Oliver _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx