Re: Returning to the performance in a small cluster topic

"Drobyshevskiy, Vladimir" <vlad@xxxxxxxxxx> · Tue, 16 Jul 2019 19:38:10 +0500

Hello, Paul!

You are effectively measuring the latency with jobs=1 here (which is appropriate considering that the WAL of a DB is effectively limited by latency) and yeah, a networked file system will always be a little bit slower than a local disk.

But I think you should be able to get a higher performance here:
* It sometimes helps to disable the write cache on the disks: hdparm -W 0 /dev/sdX
Drives connected via LSI2308, and -W 0 does nothing. I change write cache via udev queue/write_cache setting it to "write through" - it does the trick.

* Sometimes this helps: sysctl -w net.ipv4.tcp_low_latency=1
* Some more esoteric things about pinning processes to NUMA nodes usually doesn't really help with latency (but throughput).

You can run "ceph daemon osd.X perf dump" to get really detailed statistics about how much time the OSD is spending on the individual steps.
Thanks a lot, I'll try to dig into perf dump, too many numbers for the quick look.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jul 15, 2019 at 8:16 PM Jordan Share <readmail@xxxxxxxxxx> wrote:
We found shockingly bad committed IOPS/latencies on ceph.

We could get roughly 20-30 IOPS when running this fio invocation from 

within a vm:

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k 

--numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1

For non-committed IO, we get about 2800 iops, with this invocation:

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k 

--numjobs=1 --size=2G --runtime=150 --group_reporting

So, maybe, if PostreSQL has a lot of committed IO needs, you might not 

have the performance you're expecting.

You could try running your fio tests with "--fsync=1" and see if those 

numbers (which I'd expect to be very low) would be in line with your 

PostgreSQL performance.

Jordan

On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:

> Dear colleagues,

> 

>    I would like to ask you for help with a performance problem on a site 

> backed with ceph storage backend. Cluster details below.

> 

>    I've got a big problem with PostgreSQL performance. It runs inside a 

> VM with virtio-scsi ceph rbd image. And I see constant ~100% disk load 

> with up to hundreds milliseconds latencies (via atop) even when pg_top 

> shows 10-20 tps. All other resources are almost untouched - there is a 

> lot of memory and free CPU cores, DB fits memory but still has 

> performance issues.

> 

>    The cluster itself:

>    nautilus

>    6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).

>    Each node: 2x Intel Xeon E5-2665 v1 (governor = performance, 

> powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR Infiniband.

> 

>    I've made fio benchmarking with three type of measures:

>    a VM with virtio-scsi driver,

>    baremetal host with mounted rbd image

>    and the same baremetal host with mounted lvm partition on SM863 SSD 

> drive.

> 

>    I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8 jobs.

> 

>    Here are some results: https://pastebin.com/TFUg5fqA

>    Drives load on the OSD hosts are very low, just a few percent.

> 

>    Here is my ceph config: https://pastebin.com/X5ZwaUrF

> 

>    Numbers don't look very good from my point of view but they are also 

> not really bad (are they?). But I don't really know the next direction I 

> can go to solve the problem with PostgreSQL.

> 

>    I've tried to make an RAID0 with mdraid and 2 virtual drives but 

> haven't noticed any difference.

> 

>    Could you please tell me:

>    Are these performance numbers good or bad according to the hardware?

>    Is it possible to tune anything more? May be you can point me to docs 

> or other papers?

>    Does any special VM tuning for the PostgreSQL\ceph cooperation exist?

>    Thank you in advance!

> 

> --

> Best regards,

> Vladimir

> 

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- Best regards,
Vladimir
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com