hi John and Sage, as you know, i am working on [1]. but slow-requests alert are pretty much a list of strings, in which the first one is a summary, and the following ones are the details: like - 1 slow requests, 1 included below; oldest blocked for > 30.005692 secs - slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610] this fits well into a health_check_t struct. and we can add a field in MMgrReport, and send it to mgr periodically. but at the mgr side, it is supposed to compose a single std::map<string, health_check_t> in MMonMgrReport, and send it to monitor. if we put all slow requests from all osds into this map with the key like "OSD_SLOW_OPS/${osd_id}". the monstore will be loaded by a slow cluster, and the "health" section of "ceph status" will be flooded with the slow requests. or we can just collect all the slow request details into a single bucket of "OSD_SLOW_OPS". but if we just send the summaries from OSDs as the "health_check_t::detail" with the alert code of "OSD_SLOW_OPS". all the details are practically stripped off. and the total *number* of slow requests can be found nowhere unless the user parses the summary lines, and sum it up manually. we could refactor the OpTracker::check_ops_in_flight() so it returns an array of info describing slow requests instead of a list of human-readable strings. but we still need to face this problem of level-of-details. any thoughts? --- https://trello.com/c/8f9y0YM6/51-osd-stateful-health-warnings-to-mgr-mon-eg-slow-requests -- Regards Kefu Chai -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html