Dear Ceph guru, We have a Ceph cluster (version 0.80.5 38b73c67d375a2552d8ed67843c8a65c2c0feba6) with 4 MON and 16 OSDs (4 per host) used as a backend storage for libvirt. Hosts: Ubuntu 14.04 CPU: 2 Xeon X5650 RAM: 48 GB (no swap) No SSD for journals HDD: 4 WDC WD2003FYYS-02W0B0 (2 TB, 7200 rpm) dedicated to OSD (one partition for the journal, the rest for the OSD) FS: btrfs (I know it's not recommended in the doc and I hope it's not the culprit) Network: dedicated 10GbE As we added some VMs to the cluster, we saw some sporadic huge IO wait on the VM. The hosts running the OSDs seem fine. I followed a similar discussion here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040621.html Here is an example of a transaction that took some time: { "description": "osd_op(client.5275.0:262936 rbd_data.22e42ae8944a.0000000000000807 [] 3.c9699248 ack+ondisk+write e3158)", "received_at": "2014-09-23 15:23:30.820958", "age": "108.329989", "duration": "5.814286", "type_data": [ "commit sent; apply or cleanup", { "client": "client.5275", "tid": 262936}, [ { "time": "2014-09-23 15:23:30.821097", "event": "waiting_for_osdmap"}, { "time": "2014-09-23 15:23:30.821282", "event": "reached_pg"}, { "time": "2014-09-23 15:23:30.821384", "event": "started"}, { "time": "2014-09-23 15:23:30.821401", "event": "started"}, { "time": "2014-09-23 15:23:30.821459", "event": "waiting for subops from 14"}, { "time": "2014-09-23 15:23:30.821561", "event": "commit_queued_for_journal_write"}, { "time": "2014-09-23 15:23:30.821666", "event": "write_thread_in_journal_buffer"}, { "time": "2014-09-23 15:23:30.822591", "event": "op_applied"}, { "time": "2014-09-23 15:23:30.824707", "event": "sub_op_applied_rec"}, { "time": "2014-09-23 15:23:31.225157", "event": "journaled_completion_queued"}, { "time": "2014-09-23 15:23:31.225297", "event": "op_commit"}, { "time": "2014-09-23 15:23:36.635085", "event": "sub_op_commit_rec"}, { "time": "2014-09-23 15:23:36.635132", "event": "commit_sent"}, { "time": "2014-09-23 15:23:36.635244", "event": "done"}]]} sub_op_commit_rec took about 5 seconds to complete which make me think that the replication is the bottleneck. I didn't find any timeout in ceph's logs. The cluster is healthy. When some VMs have high IO wait, the cluster op/s is between 2 to 40. I would gladly submit you more information if needed. How can I dig deeper? For example is there a way to know to which OSD the replication is done for that specific transaction? Cheers, Alexandre B?choley