Thanks for the report! I've opened a PR [1] to fix the issue that was introduced when the "more" flag was added to "cls_cxx_map_get_vals" method. [1] https://github.com/ceph/ceph/pull/18270 On Thu, Oct 12, 2017 at 7:00 AM, 文刘飞 <wenliufei@xxxxxxxxxxxxxxx> wrote: > > Hi jason, > I test the rbd journal in our envirment, and find some osd marked down. > the log of osd which marked down is as following: > > 2017-10-12 18:45:35.842402 7fcc5b7ff700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fcc383ff700' had timed out after 60 > 2017-10-12 18:45:35.842416 7fcc5b7ff700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fcc407ff700' had timed out after 60 > > and the essential reason of osd_op_tp thread's timeout is the osd > trapped in journal_tag_list function. > the MAX_KEYS_READS is 64, If cls_cxx_map_get_vals get more than 64 > values, then the more will be true, > and tag_pass cann't be changed, cls_cxx_map_get_vals will be called > infinite! > > > int journal_tag_list(cls_method_context_t hctx, bufferlist *in, > bufferlist *out) { > ..... > > std::string last_read = HEADER_KEY_TAG_PREFIX; > do { > std::map<std::string, bufferlist> vals; > bool more; > r = cls_cxx_map_get_vals(hctx, last_read, HEADER_KEY_TAG_PREFIX, > MAX_KEYS_READ, &vals, &more); > if (r < 0 && r != -ENOENT) { > CLS_ERR("failed to retrieve tags: %s", cpp_strerror(r).c_str()); > return r; > } > .... > > if (tag_pass != TAG_PASS_DONE && !more) { > last_read = HEADER_KEY_TAG_PREFIX; > ++tag_pass; > } else if (!vals.empty()) { > last_read = vals.rbegin()->first; > } > } while (tag_pass != TAG_PASS_DONE); > > ::encode(tags, *out); > return 0; > } > > -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html