Re: Returning to the performance in a small cluster topic

Jordan Share <readmail@xxxxxxxxxx> · Mon, 15 Jul 2019 15:50:00 -0700

All "normal" VM usage is about what you'd expect, since a lot of apps or 
system software is still written from the days of spinning disks, when 
this (tens of ops) is the level of committed IOPS you can get from them. 
 So they let the OS cache writes and only sync when needed.

Some applications, like etcd, are very careful about their state (which 
is reasonable) and call sync after (basically) every IO.  The etcd docs 
talk about needing a high amount of "sequential IO", which we tested, 
and is fine.  But what they actually need is a high amount of 
*committed* (or sync'd, I'm not sure there is a general term for this) 
IO, which we did not test.

Our cluster works great (~1500-2500 IOPS) for normal VM use case 
(occasional syncs, cached writes), and thus I don't think it is 
hyperbole to say it is shocking how much lower (100x) the committed IOPS 
are.

I do agree that this could be a lot better documented, or a lot more 
clearly laid out.

Another documentation problem (which the balancer has more or less 
elminated) is that the docs tended (a couple years ago) to make you 
think you'd get more even utilization if you just added more PGs.  When 
really that just gives you a smoother curve, vs. a taller/narrower curve.

Jordan

On 7/15/2019 3:00 PM, Marc Roos wrote:

Isn't that why you suppose to test up front? So you do not have shocking
surprises? You can find in the mailing list archives some performance
references also.
I think it would be good to publish some performance results on the
ceph.com website. Can’t be to difficult to put some default scenarios,
used hardware and performance there in some nice graphs. I take it some
here would be willing to contribute test results of their
test/production clusters. This way new ceph’ers know what to expect
from similar setups.

-----Original Message-----
From: Jordan Share [mailto:readmail@xxxxxxxxxx]
Sent: maandag 15 juli 2019 20:16
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Returning to the performance in a small
cluster topic

We found shockingly bad committed IOPS/latencies on ceph.

We could get roughly 20-30 IOPS when running this fio invocation from
within a vm:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1

For non-committed IO, we get about 2800 iops, with this invocation:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=150 --group_reporting

So, maybe, if PostreSQL has a lot of committed IO needs, you might not
have the performance you're expecting.

You could try running your fio tests with "--fsync=1" and see if those
numbers (which I'd expect to be very low) would be in line with your
PostgreSQL performance.

Jordan

On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:
Dear colleagues,

    I would like to ask you for help with a performance problem on a
site backed with ceph storage backend. Cluster details below.

    I've got a big problem with PostgreSQL performance. It runs inside
a VM with virtio-scsi ceph rbd image. And I see constant ~100% disk
load with up to hundreds milliseconds latencies (via atop) even when
pg_top shows 10-20 tps. All other resources are almost untouched -
there is a lot of memory and free CPU cores, DB fits memory but still
has performance issues.

    The cluster itself:
    nautilus
    6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
    Each node: 2x Intel Xeon E5-2665 v1 (governor = performance,
powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR
Infiniband.

    I've made fio benchmarking with three type of measures:
    a VM with virtio-scsi driver,
    baremetal host with mounted rbd image
    and the same baremetal host with mounted lvm partition on SM863 SSD

drive.

    I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8
jobs.

    Here are some results: https://pastebin.com/TFUg5fqA
    Drives load on the OSD hosts are very low, just a few percent.

    Here is my ceph config: https://pastebin.com/X5ZwaUrF

    Numbers don't look very good from my point of view but they are
also not really bad (are they?). But I don't really know the next
direction I can go to solve the problem with PostgreSQL.

    I've tried to make an RAID0 with mdraid and 2 virtual drives but
haven't noticed any difference.

    Could you please tell me:
    Are these performance numbers good or bad according to the
hardware?
    Is it possible to tune anything more? May be you can point me to
docs or other papers?
    Does any special VM tuning for the PostgreSQL\ceph cooperation
exist?
    Thank you in advance!

--
Best regards,
Vladimir

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com