I dislike replying to my own post, but I found the issue: Looking at the changelog for 14.2.5, the zabbix key ceph.num_pg_wait_backfill has been renamed to ceph.num_pg_backfill_wait. This needs to be updated in the zabbix_template.yml Before the change: # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0 Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1; total: 1; seconds spent: 0.000033" sent: 1; skipped: 0; total: 1 # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0 Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000059" sent: 1; skipped: 0; total: 1 After the key update: # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s controller02.mgmt.cloud -p 10051 -k ceph.num_pg_backfill_wait -o 0 Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000053" sent: 1; skipped: 0; total: 1 # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s controller02.mgmt.cloud -p 10051 -k ceph.num_pg_wait_backfill -o 0 Response from "controller03.mgmt.cloud:10051": "processed: 0; failed: 1; total: 1; seconds spent: 0.000032" sent: 1; skipped: 0; total: 1 Gary. On 2019-12-11 10:54 a.m., Gary Molenkamp wrote: > After updating/restarting the manager to v14.2.5 we are no longer able > to send data to our zabbix servers. > > Ceph reports a non-zero exit status from zabbix_sender, but I have not > been able to identify the cause of the non-zero exit. > > # ceph health detail > HEALTH_WARN Failed to send data to Zabbix > MGR_ZABBIX_SEND_FAILED Failed to send data to Zabbix > /usr/bin/zabbix_sender exited non-zero: > > Setting "debug mgr = 20" yields no additional information that I could > see wrt to above issue. > > zabbix configuration in ceph has not changed since the v14.2.5 update, > and was working under v14.2.4: > > # ceph zabbix config-show > {"zabbix_port": 10051, "zabbix_host": "controller03.mgmt.cloud", > "identifier": "controller02.mgmt.cloud", "zabbix_sender": > "/usr/bin/zabbix_sender", "interval": 60} > > And I can force a send without error: > # /usr/bin/zabbix_sender -z controller03.mgmt.cloud -s > controller02.mgmt.cloud -p 10051 -k ceph.total_used_bytes -o 0 > Response from "controller03.mgmt.cloud:10051": "processed: 1; failed: 0; > total: 1; seconds spent: 0.000062" > sent: 1; skipped: 0; total: 1 > # echo $? > 0 > > Any pointers/assistance would be appreciated. > Thanks > Gary > > -- Gary Molenkamp Computer Science/Science Technology Services Systems Administrator University of Western Ontario molenkam@xxxxxx http://www.csd.uwo.ca (519) 661-2111 x86882 (519) 661-3566 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx