Re: Poor ceph cluster performance

Cody <codeology.lab@xxxxxxxxx> · Tue, 27 Nov 2018 13:21:38 -0500

Hi everyone,

Many, many thanks to all of you!

The root cause was due to a failed OS drive on one storage node. The
server was responsive to ping, but unable to login. After a reboot via
IPMI, docker daemon failed to start due to I/O errors and dmesg
complained about the failing OS disk. I failed to catch the problem
initially since  'ceph -s' kept showing HEALTH and the cluster was
"functional" despite of slow performance.

I really appreciate all the tips and advices received from you all and
learned a lot. I will carry your advices (e.g. using bluestore,
enterprise ssd/hdd, separating public and cluster traffics, etc) into
my next round PoC.

Thank you very much!

Best regards,
Cody

On Tue, Nov 27, 2018 at 6:31 AM Vitaliy Filippov <vitalif@xxxxxxxxxx> wrote:
>
> > CPU: 2 x E5-2603 @1.8GHz
> > RAM: 16GB
> > Network: 1G port shared for Ceph public and cluster traffics
> > Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> > OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
>
> 0.84 MB/s sequential write is impossibly bad, it's not normal with any
> kind of devices and even with 1G network, you probably have some kind of
> problem in your setup - maybe the network RTT is very high or maybe osd or
> mon nodes are shared with other running tasks and overloaded or maybe your
> disks are already dead... :))
>
> > As I moved on to test block devices, I got a following error message:
> >
> > # rbd map image01 --pool testbench --name client.admin
>
> You don't need to map it to run benchmarks, use `fio --ioengine=rbd`
> (however you'll still need /etc/ceph/ceph.client.admin.keyring)
>
> --
> With best regards,
>    Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com