A very small 3-node Ceph cluster with this OSD tree: http://pastebin.com/mUhayBk9 has some performance issues. All 27 OSDs are 5TB SATA drives, it keeps two copies of everything, and it's really only intended for nearline backups of large data objects. All of the OSDs look OK in terms of utilization. iostat reports them around 5-10 %util with occasional spikes up to 20%, and doing typically 5-20 IOPs each. (Which, if we figure 100 IOPs per drive, matches up nicely with the %util. The Ceph nodes also look health. CPU is 75-90% idle. Plenty of RAM (the smaller node has 32GiB, the larger ones have 64GiB. They have dual 10GBase-T NICs on an unloaded, dedicated storage switch. The pool has 4096 placement groups. Currently we have noscrub & nodeep-scrub set to eliminate sctubbing as source of performance problems. When we increased placement groups from the default to 4096, a ton of data moved and quickly, the cluster is definitely capable of pushing data around at multi-gigabit speeds if it wants to. Yet this cluster has (currently) one client, a KVM machine backed by a 32TB RBD image. It is a Linux VM running ZFS that recevies ZFS snapshots from other machines to back them up. However, the performance is pretty bad. When it is receiving even one snapshot, iostat -x on the client reports it is choking on I/O to the Ceph rbd image: 99.88 %util while doing 50-75 IOPs and 5-20 MB/sec of throughput. Is there anything we can do to improve the performance of this configuration, or at least figure out why it is so bad? These are large SATA drives with no SSD OSD's, so we don't expect miracles. But it sure would be nice to see client I/O that was better than 50 IOPs and 20MB/sec. Thanks in advance for any help or guidance on this! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com