Hi everyone, Many, many thanks to all of you! The root cause was due to a failed OS drive on one storage node. The server was responsive to ping, but unable to login. After a reboot via IPMI, docker daemon failed to start due to I/O errors and dmesg complained about the failing OS disk. I failed to catch the problem initially since 'ceph -s' kept showing HEALTH and the cluster was "functional" despite of slow performance. I really appreciate all the tips and advices received from you all and learned a lot. I will carry your advices (e.g. using bluestore, enterprise ssd/hdd, separating public and cluster traffics, etc) into my next round PoC. Thank you very much! Best regards, Cody On Tue, Nov 27, 2018 at 6:31 AM Vitaliy Filippov <vitalif@xxxxxxxxxx> wrote: > > > CPU: 2 x E5-2603 @1.8GHz > > RAM: 16GB > > Network: 1G port shared for Ceph public and cluster traffics > > Journaling device: 1 x 120GB SSD (SATA3, consumer grade) > > OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade) > > 0.84 MB/s sequential write is impossibly bad, it's not normal with any > kind of devices and even with 1G network, you probably have some kind of > problem in your setup - maybe the network RTT is very high or maybe osd or > mon nodes are shared with other running tasks and overloaded or maybe your > disks are already dead... :)) > > > As I moved on to test block devices, I got a following error message: > > > > # rbd map image01 --pool testbench --name client.admin > > You don't need to map it to run benchmarks, use `fio --ioengine=rbd` > (however you'll still need /etc/ceph/ceph.client.admin.keyring) > > -- > With best regards, > Vitaliy Filippov _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com