This is the same cluster I posted about back in April. Since then, the situation has gotten significantly worse. Here is what iostat looks like for the one active RBD image on this cluster: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vdb 0.00 0.00 14.10 0.00 685.65 0.00 97.26 3.43 299.40 299.40 0.00 70.92 100.00 vdb 0.00 0.00 1.10 0.00 140.80 0.00 256.00 3.00 2753.09 2753.09 0.00 909.09 100.00 vdb 0.00 0.00 17.40 0.00 2227.20 0.00 256.00 3.00 178.78 178.78 0.00 57.47 100.00 vdb 0.00 0.00 1.30 0.00 166.40 0.00 256.00 3.00 2256.62 2256.62 0.00 769.23 100.00 vdb 0.00 0.00 8.20 0.00 1049.60 0.00 256.00 3.00 362.10 362.10 0.00 121.95 100.00 vdb 0.00 0.00 1.10 0.00 140.80 0.00 256.00 3.00 2517.45 2517.45 0.00 909.45 100.04 vdb 0.00 0.00 1.10 0.00 140.66 0.00 256.00 3.00 2863.64 2863.64 0.00 909.09 99.90 vdb 0.00 0.00 0.70 0.00 89.60 0.00 256.00 3.00 3898.86 3898.86 0.00 1428.57 100.00 vdb 0.00 0.00 0.60 0.00 76.80 0.00 256.00 3.00 5093.33 5093.33 0.00 1666.67 100.00 vdb 0.00 0.00 1.20 0.00 153.60 0.00 256.00 3.00 2568.33 2568.33 0.00 833.33 100.00 vdb 0.00 0.00 1.30 0.00 166.40 0.00 256.00 3.00 2457.85 2457.85 0.00 769.23 100.00 vdb 0.00 0.00 13.90 0.00 1779.20 0.00 256.00 3.00 220.95 220.95 0.00 71.94 100.00 vdb 0.00 0.00 1.00 0.00 128.00 0.00 256.00 3.00 2250.40 2250.40 0.00 1000.00 100.00 vdb 0.00 0.00 1.30 0.00 166.40 0.00 256.00 3.00 2798.77 2798.77 0.00 769.23 100.00 vdb 0.00 0.00 0.90 0.00 115.20 0.00 256.00 3.00 3304.00 3304.00 0.00 1111.11 100.00 vdb 0.00 0.00 0.90 0.00 115.20 0.00 256.00 3.00 3425.33 3425.33 0.00 1111.11 100.00 vdb 0.00 0.00 1.30 0.00 166.40 0.00 256.00 3.00 2290.77 2290.77 0.00 769.23 100.00 vdb 0.00 0.00 4.30 0.00 550.40 0.00 256.00 3.00 721.30 721.30 0.00 232.56 100.00 vdb 0.00 0.00 1.60 0.00 204.80 0.00 256.00 3.00 1894.75 1894.75 0.00 625.00 100.00 vdb 0.00 0.00 1.20 0.00 153.60 0.00 256.00 3.00 2375.00 2375.00 0.00 833.33 100.00 vdb 0.00 0.00 0.90 0.00 115.20 0.00 256.00 3.00 3036.44 3036.44 0.00 1111.11 100.00 vdb 0.00 0.00 1.10 0.00 140.80 0.00 256.00 3.00 3086.18 3086.18 0.00 909.09 100.00 vdb 0.00 0.00 0.90 0.00 115.20 0.00 256.00 3.00 2480.44 2480.44 0.00 1111.11 100.00 vdb 0.00 0.00 1.20 0.00 153.60 0.00 256.00 3.00 3124.33 3124.33 0.00 833.67 100.04 vdb 0.00 0.00 0.80 0.00 102.40 0.00 256.00 3.00 3228.00 3228.00 0.00 1250.00 100.00 vdb 0.00 0.00 1.20 0.00 153.60 0.00 256.00 3.00 2439.33 2439.33 0.00 833.33 100.00 vdb 0.00 0.00 1.30 0.00 166.40 0.00 256.00 3.00 2567.08 2567.08 0.00 769.23 100.00 vdb 0.00 0.00 0.80 0.00 102.40 0.00 256.00 3.00 3023.00 3023.00 0.00 1250.00 100.00 vdb 0.00 0.00 4.80 0.00 614.40 0.00 256.00 3.00 712.50 712.50 0.00 208.33 100.00 vdb 0.00 0.00 1.30 0.00 118.75 0.00 182.69 3.00 2003.69 2003.69 0.00 769.23 100.00 vdb 0.00 0.00 10.50 0.00 1344.00 0.00 256.00 3.00 344.46 344.46 0.00 95.24 100.00 So, between 0 and 15 reads per second, no write activity, a constant queue depth of 3+, wait times in seconds, and 100% I/O utilization, all for read performance of 100-200K/sec. Even trivial writes can hang for 15-60 seconds before completing. Sometimes this behavior will "go away" for awhile and it will go back to what we saw in April: 50IOPS (read or write) and 5-20MB/sec of I/O throughput. But it always comes back. The hardware of the ceph cluster is: - Three ceph nodes - Two of the ceph nodes have 64GiB RAM and 12 5TB SATA drives - One of the ceph nodes has 32GiB RAM and 4 5TB SATA drives - All ceph nodes have Intel E5-2609 v2 (2.50Ghz quad-core) CPUs - Everything is 10GBase-T - All three nodes running Ceph 0.80.9 The ceph hardware is all borderline idle. The CPU is 3-5% utilized and iostat reports the individual disks hover around 4-7% utilized at any given time. It does appear to be using most of the available RAM for OSD caching. The client is a KVM virtual machine running on a server by itself. Inside the virtual machine it reports 100% CPU utilization by iowait. Outside the virtual machine host, it reports everything is idle (99.1% idle). Something is *definitely* wrong. Does anyone have any idea what it might be? Thanks for any help with this! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com