Based on the ceph watch output my guess is the osd_heartbeat_grace default of 20 is causing my reporting issues. I've seen failures, all of which recover, from reporting after 22 to ~28 second. I was unable to set osd_heartbeat_grace using the runtime command - every syntax I tried the cmd failed. I changed the setting in ceph.conf and restarted all of the daemons. The runtime config now reflects the new setting of osd_heartbeat_grace to 30, but I still see osd failures in the ceph -w output for reporting outside the 20 second grace. - What am I overlooking? - What is the proper syntax for changing the osd_heartbeat_grace at runtime? [root at ceph0 ceph]# ceph osd tell osd.* injectargs '--osd_heartbeat_grace 30' "osd tell" is deprecated; try "tell osd.<id>" instead (id can be "*") [root at ceph0 ceph]# ceph osd tell osd.20 injectargs '--osd_heartbeat_grace 30' "osd tell" is deprecated; try "tell osd.<id>" instead (id can be "*") [root at ceph0 ceph]# ceph osd tell * injectargs '--osd_heartbeat_grace 30' "osd tell" is deprecated; try "tell osd.<id>" instead (id can be "*") [root at ceph0 ceph]# ceph ceph-osd tell osd.* injectargs '--osd_heartbeat_grace 30' no valid command found; 10 closest matches: After making change to ceph.conf and restarting all daemons the osd_heartbeat_grace is now reporting 30, but osd's are still failing for exceeding the 20 second default grace. [root at ceph0 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.20.asok config show | grep grace "mon_osd_adjust_heartbeat_grace": "true", "mds_beacon_grace": "15", "osd_heartbeat_grace": "30", [root at ceph0 ceph]# 2014-08-23 14:16:18.069827 mon.0 [INF] osd.20 209.243.160.83:6806/23471 failed (76 reports from 20 peers after 24.267838 >= grace 20.994852) 2014-08-23 14:13:20.057523 osd.26 [WRN] map e28337 wrongly marked me down From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Bruce McFarland Sent: Saturday, August 23, 2014 1:24 PM To: ceph-users at ceph.com Subject: Monitor/OSD report tuning question Hello, I have a Cluster with 30 OSDs distributed over 3 Storage Servers connected by a 10G cluster link and connected to the Monitor over 1G. I still have a lot to understand with Ceph. Observing the cluster messages in a "ceph -watch" window I see a lot of osd "flapping" when it is sitting in a configured state and page/placement groups constantly changing status. The cluster was configured and came up to 1920 'active + clean' pages. The 3 status below outputs were issued over the course of about 2 to minutes. As you can see there is a lot of activity where I'm assuming the osd reporting is occasionally outside the heartbeat TO and various pages/placement groups get set to 'stale' and/or 'degrded' but still 'active'. There are osd's being marked 'out' in the osd map that I see in the watch window as reported of failures that very quickly report "wrongly marked me down". I'm assuming I need to 'tune' some of the many TO values so that the osd's and page/placement groups all can report within the TO window. A quick look at the -admin-daemon config show cmd tells me that I might consider tuning some of these values: [root at ceph0 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.20.asok config show | grep report "mon_osd_report_timeout": "900", "mon_osd_min_down_reporters": "1", "mon_osd_min_down_reports": "3", "osd_mon_report_interval_max": "120", "osd_mon_report_interval_min": "5", "osd_pg_stat_report_interval_max": "500", [root at ceph0 ceph]# Which osd and/or mon settings should I increase/decrease to eliminate all this state flapping while the cluster sits configured with no data? Thanks, Bruce 014-08-23 13:16:15.564932 mon.0 [INF] osd.20 209.243.160.83:6800/20604 failed (65 reports from 20 peers after 23.380808 >= grace 21.991016) 2014-08-23 13:16:15.565784 mon.0 [INF] osd.23 209.243.160.83:6810/29727 failed (79 reports from 20 peers after 23.675170 >= grace 21.990903) 2014-08-23 13:16:15.566038 mon.0 [INF] osd.25 209.243.160.83:6808/31984 failed (65 reports from 20 peers after 23.380921 >= grace 21.991016) 2014-08-23 13:16:15.566206 mon.0 [INF] osd.26 209.243.160.83:6811/518 failed (65 reports from 20 peers after 23.381043 >= grace 21.991016) 2014-08-23 13:16:15.566372 mon.0 [INF] osd.27 209.243.160.83:6822/2511 failed (65 reports from 20 peers after 23.381195 >= grace 21.991016) . . . 2014-08-23 13:17:09.547684 osd.20 [WRN] map e27128 wrongly marked me down 2014-08-23 13:17:10.826541 osd.23 [WRN] map e27130 wrongly marked me down 2014-08-23 13:20:09.615826 mon.0 [INF] osdmap e27134: 30 osds: 26 up, 30 in 2014-08-23 13:17:10.954121 osd.26 [WRN] map e27130 wrongly marked me down 2014-08-23 13:17:19.125177 osd.25 [WRN] map e27135 wrongly marked me down [root at ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_OK monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26636: 30 osds: 30 up, 30 in pgmap v56534: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26586 MB used, 109 TB / 109 TB avail 1920 active+clean [root at ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_WARN 160 pgs degraded; 83 pgs stale monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26641: 30 osds: 30 up, 30 in pgmap v56545: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26558 MB used, 109 TB / 109 TB avail 83 stale+active+clean 160 active+degraded 1677 active+clean [root at ceph-mon01 ceph]# ceph -s cluster f919f2e4-8e3c-45d1-a2a8-29bc604f9f7d health HEALTH_OK monmap e1: 1 mons at {ceph-mon01=209.243.160.84:6789/0}, election epoch 2, quorum 0 ceph-mon01 osdmap e26657: 30 osds: 30 up, 30 in pgmap v56584: 1920 pgs, 3 pools, 0 bytes data, 0 objects 26610 MB used, 109 TB / 109 TB avail 1920 active+clean [root at ceph-mon01 ceph]# -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140823/bd5e6056/attachment.htm>