Excellent sleuthing. I was able to change the key on my Zabbix server instance and all is happy in Ceph again. Thanks, Reed > On Dec 11, 2019, at 10:24 AM, Gary Molenkamp <molenkam@xxxxxx> wrote: > > I dislike replying to my own post, but I found the issue: > > Looking at the changelog for 14.2.5, the zabbix key > ceph.num_pg_wait_backfill has been renamed to ceph.num_pg_backfill_wait. > > This needs to be updated in the zabbix_template.yml > > Before the change: > > # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s > controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0 > Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1; > total: 1; seconds spent: 0.000033" > sent: 1; skipped: 0; total: 1 > # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s > controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0 > Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; > total: 1; seconds spent: 0.000059" > sent: 1; skipped: 0; total: 1 > > After the key update: > > # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s > controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0 > Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; > total: 1; seconds spent: 0.000053" > sent: 1; skipped: 0; total: 1 > # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s > controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0 > Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1; > total: 1; seconds spent: 0.000032" > sent: 1; skipped: 0; total: 1 > > Gary. > > > > On 2019-12-11 10:54 a.m., Gary Molenkamp wrote: >> After updating/restarting the manager to v14.2.5 we are no longer able >> to send data to our zabbix servers. >> >> Ceph reports a non-zero exit status from zabbix_sender, but I have not >> been able to identify the cause of the non-zero exit. >> >> # ceph health detail >> HEALTH_WARN Failed to send data to Zabbix >> MGR_ZABBIX_SEND_FAILED Failed to send data to Zabbix >> /usr/bin/zabbix_sender exited non-zero: >> >> Setting "debug mgr = 20" yields no additional information that I could >> see wrt to above issue. >> >> zabbix configuration in ceph has not changed since the v14.2.5 update, >> and was working under v14.2.4: >> >> # ceph zabbix config-show >> {"zabbix_port": 10051, "zabbix_host": "controller03.mgmt.cloud", >> "identifier": "controller02.mgmt.cloud", "zabbix_sender": >> "/usr/bin/zabbix_sender", "interval": 60} >> >> And I can force a send without error: >> # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s >> controller02.mgmt.cloud -p 10051 -k ceph.total_used_bytes -o 0 >> Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; >> total: 1; seconds spent: 0.000062" >> sent: 1; skipped: 0; total: 1 >> # echo $? >> 0 >> >> Any pointers/assistance would be appreciated. >> Thanks >> Gary >> >> > > -- > Gary Molenkamp Computer Science/Science Technology Services > Systems Administrator University of Western Ontario > molenkam@xxxxxx http://www.csd.uwo.ca > (519) 661-2111 x86882 (519) 661-3566 > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx