Re: Device Health Metrics on EL 7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Benjeman, dear all,

indeed, after waiting a bit longer and an mgr restart, it now works
(for the single case where I temporarily had SELinux off)!

So at least we now know the remaining issues with health metrics :-).

Cheers,
	Oliver

Am 07.11.19 um 18:51 schrieb Oliver Freyermuth:
Dear Benjeman,

thanks! Indeed, it seems I have to do something similar to that to get:
  ceph daemon osd.14 smart <device-id>
to work. For some reason, "ceph device get-health-metrics" and friends still get stuck for me,
but maybe that would just need more time.

Now I have to ponder whether to really apply this broad SELinux policy to all file servers or rather wait for an SELinux expert to provide something more granular...
Thanks for linking the bug report, I have subscribed to it immediately :-).

Cheers and thanks,
     Oliver

Am 04.11.19 um 14:37 schrieb Benjeman Meekhof:
Hi Oliver,

The ceph-osd RPM packages include a config in
/etc/sudoers.d/ceph-osd-smartctl that looks something like this:
ceph ALL=NOPASSWD: /usr/sbin/smartctl -a --json /dev/*
ceph ALL=NOPASSWD: /usr/sbin/nvme * smart-log-add --json /dev/*

If you are using SElinux you will have to adjust capabilities there as
well.  I think we did something kind of similar to what is attached to
this tracker issue:
https://tracker.ceph.com/issues/40683

That seemed to get us as far as hosts being able to report disk health
to the module.

thanks,
Ben



On Sat, Nov 2, 2019 at 11:38 PM Oliver Freyermuth
<freyermuth@xxxxxxxxxxxxxxxxxx> wrote:

Dear Cephers,

I went through some of the OSD logs of our 14.2.4 nodes and found this:
----------------------------------
Nov 01 01:22:25  sudo[1087697]:     ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a --json /dev/sds
Nov 01 01:22:51  sudo[1087729]: pam_unix(sudo:auth): conversation failed
Nov 01 01:22:51  sudo[1087729]: pam_unix(sudo:auth): auth could not identify password for [ceph]
Nov 01 01:22:51  sudo[1087729]: pam_succeed_if(sudo:auth): requirement "uid >= 1000" not met by user "ceph"
Nov 01 01:22:53  sudo[1087729]:     ceph : command not allowed ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=nvme lvm smart-log-add --json /dev/sds
----------------------------------
It seems with device health metrics, the OSDs try to run smartctl with "sudo", which expectedly fails, since the Ceph user (as system user) has a uid smaller than 1000.
Also, it's of course not in /etc/sudoers.

Does somebody have a working setup with device health metrics which could be shared (and documented, or made part of future packaging ;-) ) ?

Cheers,
         Oliver

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux