Re: Poor ceph cluster performance

Paul Emmerich <paul.emmerich@xxxxxxxx> · Tue, 27 Nov 2018 19:53:12 +0100

And this exact problem was one of the reasons why we migrated
everything to PXE boot where the OS runs from RAM.
That kind of failure is just the worst to debug...
Also, 1 GB of RAM is cheaper than a separate OS disk.

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Di., 27. Nov. 2018 um 19:22 Uhr schrieb Cody <codeology.lab@xxxxxxxxx>:
>
> Hi everyone,
>
> Many, many thanks to all of you!
>
> The root cause was due to a failed OS drive on one storage node. The
> server was responsive to ping, but unable to login. After a reboot via
> IPMI, docker daemon failed to start due to I/O errors and dmesg
> complained about the failing OS disk. I failed to catch the problem
> initially since  'ceph -s' kept showing HEALTH and the cluster was
> "functional" despite of slow performance.
>
> I really appreciate all the tips and advices received from you all and
> learned a lot. I will carry your advices (e.g. using bluestore,
> enterprise ssd/hdd, separating public and cluster traffics, etc) into
> my next round PoC.
>
> Thank you very much!
>
> Best regards,
> Cody
>
> On Tue, Nov 27, 2018 at 6:31 AM Vitaliy Filippov <vitalif@xxxxxxxxxx> wrote:
> >
> > > CPU: 2 x E5-2603 @1.8GHz
> > > RAM: 16GB
> > > Network: 1G port shared for Ceph public and cluster traffics
> > > Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> > > OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
> >
> > 0.84 MB/s sequential write is impossibly bad, it's not normal with any
> > kind of devices and even with 1G network, you probably have some kind of
> > problem in your setup - maybe the network RTT is very high or maybe osd or
> > mon nodes are shared with other running tasks and overloaded or maybe your
> > disks are already dead... :))
> >
> > > As I moved on to test block devices, I got a following error message:
> > >
> > > # rbd map image01 --pool testbench --name client.admin
> >
> > You don't need to map it to run benchmarks, use `fio --ioengine=rbd`
> > (however you'll still need /etc/ceph/ceph.client.admin.keyring)
> >
> > --
> > With best regards,
> >    Vitaliy Filippov
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com