I have a small ceph cluster (3 nodes, 5 osds each, journals all just partitions on the spinner disks) and I have noticed that when I hit it with a bunch of rados bench clients all doing writes of large (40M objects) with --no-cleanup, the rados bench commands seem to finish OK but I often get health warnings like HEALTH_WARN 4 requests are blocked > 32 sec; 2 osds have slow requests 3 ops are blocked > 32.768 sec on osd.9 1 ops are blocked > 32.768 sec on osd.10 2 osds have slow requests After a couple of minutes, health goes to HEALTH_OK. But if I go to the node containing osd.10 for example and do dump_historic_ops I do get lots of around 20-sec durations but nothing over 32 sec. The 20-sec or so ops are always "ack+ondisk+write+known_if_redirected" with type_data = "commit sent: apply or cleanup" and the following are typical event timings initiated: 14:06:58.205937 reached_pg: 14:07:01.823288, gap= 3617.351 started: 14:07:01.823359, gap= 0.071 waiting for subops from 3: 14:07:01.855259, gap= 31.900 commit_queued_for_journal_write: 14:07:03.132697, gap= 1277.438 write_thread_in_journal_buffer: 14:07:03.143356, gap= 10.659 journaled_completion_queued: 14:07:04.175863, gap= 1032.507 op_commit: 14:07:04.585040, gap= 409.177 op_applied: 14:07:04.589751, gap= 4.711 sub_op_commit_rec from 3: 14:07:14.682925, gap= 10093.174 commit_sent: 14:07:14.683081, gap= 0.156 done: 14:07:14.683119, gap= 0.038 Should I expect to see a historic op with duration greater than 32 sec? -- Tom Deneau -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html