That can not be correct.
Check it at your cluster with dstat as i said...
You will see at every node parallel IO on every OSD and
journal....
Am 21.07.16 um 15:02 schrieb Jake
Young:
I think the answer is that with 1 thread you can only
ever write to one journal at a time. Theoretically, you would need
10 threads to be able to write to 10 nodes at the same time.
Jake
On Thursday, July 21, 2016, wr@xxxxxxxx < wr@xxxxxxxx>
wrote:
What i not really undertand is:
Lets say the Intel P3700 works with 200 MByte/s rados
bench one thread... See Nicks results below...
If we have multiple OSD Nodes. For example 10 Nodes.
Every Node has exactly 1x P3700 NVMe built in.
Why is the single Thread performance exactly at 200
MByte/s on the rbd client with 10 OSD Node Cluster???
I think it must be at 10 Nodes * 200 MByte/s = 2000
MByte/s.
Everyone look yourself at your cluster.
dstat -D sdb,sdc,sdd,sdX ....
You will see that Ceph stripes the data over all OSD's in
the cluster if you test at the client side with rados
bench...
rados bench -p rbd 60 write -b 4M -t 1
Is there not a way to enable Linux
page Cache? So do not user D_Sync...
Then we would the dramatically performance improve.
Am 21.07.16 um 14:33 schrieb Nick Fisk:
-----Original Message-----
From: wr@xxxxxxxx [mailto:wr@xxxxxxxx]
Sent: 21 July 2016 13:23
To: nick@xxxxxxxxxx; 'Horace Ng' <horace@xxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Ceph + VMware + Single
Thread Performance
Okay and what is your plan now to speed up ?
Now I have come up with a lower latency hardware design,
there is not much further improvement until persistent
RBD caching is implemented, as you will be moving the
SSD/NVME closer to the client. But I'm happy with what I
can achieve at the moment. You could also experiment
with bcache on the RBD.
Would it help to put in multiple
P3700 per OSD Node to improve performance for a single
Thread (example Storage VMotion) ?
Most likely not, it's all the other parts of the puzzle
which are causing the latency. ESXi was designed for
storage arrays that service IO's in 100us-1ms range,
Ceph is probably about 10x slower than this, hence the
problem. Disable the BBWC on a RAID controller or SAN
and you will the same behaviour.
Regards
Am 21.07.16 um 14:17 schrieb Nick Fisk:
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf
Of wr@xxxxxxxx
Sent: 21 July 2016 13:04
To: nick@xxxxxxxxxx; 'Horace Ng'
<horace@xxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Ceph + VMware + Single
Thread Performance
Hi,
hmm i think 200 MByte/s is really bad. Is your
Cluster in production right now?
It's just been built, not running yet.
So if you start a storage
migration you get only 200 MByte/s right?
I wish. My current cluster (not this new one) would
storage migrate at
~10-15MB/s. Serial latency is the problem, without
being able to
buffer, ESXi waits on an ack for each IO before
sending the next. Also it submits the migrations in
64kb chunks, unless you get VAAI
working. I think esxi will try and do them in
parallel, which will help as well.
I think it would be awesome
if you get 1000 MByte/s
Where is the Bottleneck?
Latency serialisation, without a buffer, you can't
drive the devices
to 100%. With buffered IO (or high queue depths) I
can max out the journals.
A FIO Test from Sebastien
Han give us 400 MByte/s raw performance from the
P3700.
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your
-ssd-is-suitable-as-a-journal-device/
How could it be that the rbd client performance is
50% slower?
Regards
Am 21.07.16 um 12:15 schrieb Nick Fisk:
I've had a lot of pain
with this, smaller block sizes are even worse.
You want to try and minimize latency at every
point as there is no
buffering happening in the iSCSI stack. This
means:-
1. Fast journals (NVME or NVRAM)
2. 10GB or better networking
3. Fast CPU's (Ghz)
4. Fix CPU c-state's to C1
5. Fix CPU's Freq to max
Also I can't be sure, but I think there is a
metadata update
happening with VMFS, particularly if you are
using thin VMDK's, this
can also be a major bottleneck. For my use case,
I've switched over to NFS as it has given much
more performance at scale and
less headache.
For the RADOS Run, here
you go (400GB P3700):
Total time run: 60.026491
Total writes made: 3104
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 206.842
Stddev Bandwidth: 8.10412
Max bandwidth (MB/sec): 224
Min bandwidth (MB/sec): 180
Average IOPS: 51
Stddev IOPS: 2
Max IOPS: 56
Min IOPS: 45
Average Latency(s): 0.0193366
Stddev Latency(s): 0.00148039
Max latency(s): 0.0377946
Min latency(s): 0.015909
Nick
-----Original
Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On
Behalf Of Horace
Sent: 21 July 2016 10:26
To: wr@xxxxxxxx
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Ceph + VMware +
Single Thread Performance
Hi,
Same here, I've read some blog saying that
vmware will frequently
verify the locking on VMFS over iSCSI, hence
it will have much slower performance than NFS
(with different locking mechanism).
Regards,
Horace Ng
----- Original Message -----
From: wr@xxxxxxxx
To: ceph-users@xxxxxxxxxxxxxx
Sent: Thursday, July 21, 2016 5:11:21 PM
Subject: Ceph + VMware + Single
Thread Performance
Hi everyone,
we see at our cluster relatively slow Single
Thread Performance on the iscsi Nodes.
Our setup:
3 Racks:
18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway
Nodes with tgt (rbd cache off).
2x Samsung SM863 Enterprise SSD for Journal (3
OSD per SSD) and 6x
WD Red 1TB per Data Node as OSD.
Replication = 3
chooseleaf = 3 type Rack in the crush map
We get only ca. 90 MByte/s on the iscsi
Gateway Servers with:
rados bench -p rbd 60 write -b 4M -t 1
If we test with:
rados bench -p rbd 60 write -b 4M -t 32
we get ca. 600 - 700 MByte/s
We plan to replace the Samsung SSD with Intel
DC P3700 PCIe NVM'e
for the Journal to get better Single Thread
Performance.
Is anyone of you out there who has an Intel
P3700 for Journal an
can give me back test results with:
rados bench -p rbd 60 write -b 4M -t 1
Thank you very much !!
Kind Regards !!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|