Re: Ceph + VMware + Single Thread Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 

From: Jake Young [mailto:jak3kaj@xxxxxxxxx]
Sent: 21 July 2016 13:24
To: nick@xxxxxxxxxx; wr@xxxxxxxx
Cc: Horace Ng <horace@xxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx
Subject: Re: Ceph + VMware + Single Thread Performance

 

My workaround to your single threaded performance issue was to increase the thread count of the tgtd process (I added --nr_iothreads=128 as an argument to tgtd).  This does help my workload.  

 

FWIW below are my rados bench numbers from my cluster with 1 thread:

 

This first one is a "cold" run. This is a test pool, and it's not in use.  This is the first time I've written to it in a week (but I have written to it before). 

 

Total time run:         60.049311

Total writes made:      1196

Write size:             4194304

Bandwidth (MB/sec):     79.668

 

Stddev Bandwidth:       80.3998

Max bandwidth (MB/sec): 208

Min bandwidth (MB/sec): 0

Average Latency:        0.0502066

Stddev Latency:         0.47209

Max latency:            12.9035

Min latency:            0.013051

 

This next one is the 6th run. I honestly don't understand why there is such a huge performance difference. 

 

Total time run:         60.042933

Total writes made:      2980

Write size:             4194304

Bandwidth (MB/sec):     198.525

 

Stddev Bandwidth:       32.129

Max bandwidth (MB/sec): 224

Min bandwidth (MB/sec): 0

Average Latency:        0.0201471

Stddev Latency:         0.0126896

Max latency:            0.265931

Min latency:            0.013211

 

 

75 OSDs, all 2TB SAS spinners.  There are 9 OSD servers each has a 2GB BBU RAID cache.

 

I have tuned my CPU c-state and freq to max, I have 8x 2.5MHz cores, so just about one core per OSD. I have 40G networking.  I don't use journals, but I have the RAID cache enabled.

 

 

Nick,

 

What NFS server are you using?

 

The kernel one. Seems to be working really so far after I got past the XFS fragmentation issues, I had to set an extent size hint of 16mb at the root.

 

 

Jake 

 


On Thursday, July 21, 2016, Nick Fisk <nick@xxxxxxxxxx> wrote:

I've had a lot of pain with this, smaller block sizes are even worse. You want to try and minimize latency at every point as there
is no buffering happening in the iSCSI stack. This means:-

1. Fast journals (NVME or NVRAM)
2. 10GB or better networking
3. Fast CPU's (Ghz)
4. Fix CPU c-state's to C1
5. Fix CPU's Freq to max

Also I can't be sure, but I think there is a metadata update happening with VMFS, particularly if you are using thin VMDK's, this
can also be a major bottleneck. For my use case, I've switched over to NFS as it has given much more performance at scale and less
headache.

For the RADOS Run, here you go (400GB P3700):

Total time run:         60.026491
Total writes made:      3104
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     206.842
Stddev Bandwidth:       8.10412
Max bandwidth (MB/sec): 224
Min bandwidth (MB/sec): 180
Average IOPS:           51
Stddev IOPS:            2
Max IOPS:               56
Min IOPS:               45
Average Latency(s):     0.0193366
Stddev Latency(s):      0.00148039
Max latency(s):         0.0377946
Min latency(s):         0.015909

Nick

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Horace
> Sent: 21 July 2016 10:26
> To: wr@xxxxxxxx
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re: Ceph + VMware + Single Thread Performance
>
> Hi,
>
> Same here, I've read some blog saying that vmware will frequently verify the locking on VMFS over iSCSI, hence it will have much
> slower performance than NFS (with different locking mechanism).
>
> Regards,
> Horace Ng
>
> ----- Original Message -----
> From: wr@xxxxxxxx
> To: ceph-users@xxxxxxxxxxxxxx
> Sent: Thursday, July 21, 2016 5:11:21 PM
> Subject: Ceph + VMware + Single Thread Performance
>
> Hi everyone,
>
> we see at our cluster relatively slow Single Thread Performance on the iscsi Nodes.
>
>
> Our setup:
>
> 3 Racks:
>
> 18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway Nodes with tgt (rbd cache off).
>
> 2x Samsung SM863 Enterprise SSD for Journal (3 OSD per SSD) and 6x WD
> Red 1TB per Data Node as OSD.
>
> Replication = 3
>
> chooseleaf = 3 type Rack in the crush map
>
>
> We get only ca. 90 MByte/s on the iscsi Gateway Servers with:
>
> rados bench -p rbd 60 write -b 4M -t 1
>
>
> If we test with:
>
> rados bench -p rbd 60 write -b 4M -t 32
>
> we get ca. 600 - 700 MByte/s
>
>
> We plan to replace the Samsung SSD with Intel DC P3700 PCIe NVM'e for
> the Journal to get better Single Thread Performance.
>
> Is anyone of you out there who has an Intel P3700 for Journal an can
> give me back test results with:
>
>
> rados bench -p rbd 60 write -b 4M -t 1
>
>
> Thank you very much !!
>
> Kind Regards !!
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux