On Tue, Nov 21, 2017 at 10:46 AM, Christian Balzer <chibi@xxxxxxx> wrote:
On Tue, 21 Nov 2017 09:21:58 +0200 Rudi Ahlers wrote:
> On Mon, Nov 20, 2017 at 2:36 PM, Christian Balzer <chibi@xxxxxxx> wrote:
>
> > On Mon, 20 Nov 2017 14:02:30 +0200 Rudi Ahlers wrote:
> >
> > > We're planning on installing 12X Virtual Machines with some heavy loads.
> > >
> > > the SSD drives are INTEL SSDSC2BA400G4
> > >
> > Interesting, where did you find those?
> > Or did you have them lying around?
> >
> > I've been unable to get DC S3710 SSDs for nearly a year now.
> >
>
> In South Africa, one of our suppliers had some in stock. They're still
> fairly new, about 2 months old now.
>
>
Odd, oh well.
>
>
> > The SATA drives are ST8000NM0055-1RM112
> > >
> > Note that these (while fast) have an internal flash cache, limiting them to
> > something like 0.2 DWPD.
> > Probably not an issue with the WAL/DB on the Intels, but something to keep
> > in mind.
> >
>
>
> I don't quite understand what you want to say, please explain?
>
See the other mails in this thread after the one above.
In short, probably nothing to worry about.
>
>
> > > Please explain your comment, "b) will find a lot of people here who don't
> > > approve of it."
> > >
> > Read the archives.
> > Converged clusters are complex and debugging Ceph when tons of other
> > things are going on at the same time on the machine even more so.
> >
>
>
> Ok, so I have 4 physical servers and need to setup a highly redundant
> cluster. How else would you have done it? There is no budget for a SAN, let
> alone a highly available SAN.
>
As I said, I'd be fine doing it with Ceph, if that was a good match.
It's easy to starve resources with hyperconverged clusters.
Since you're using proxmox, DRBD would be an obvious alternative,
especially if you're not planning on growing this cluster.
You only mentioned 3 servers so far, is the fourth one non-Ceph?
From what I have read, DRBD isn't very stable?
The 4th one will be for backups.
>
>
> >
> > > I don't have access to the switches right now, but they're new so
> > whatever
> > > default config ships from factory would be active. Though iperf shows
> > 10.5
> > > GBytes / 9.02 Gbits/sec throughput.
> > >
> > Didn't think it was the switches, but completeness sake and all that.
> >
> > > What speeds would you expect?
> > > "Though with your setup I would have expected something faster, but NOT
> > the
> > > theoretical 600MB/s 4 HDDs will do in sequential writes."
> > >
> > What I wrote.
> > A 7200RPM HDD, even these, can not sustain writes much over 170MB/s, in
> > the most optimal circumstances.
> > So your cluster can NOT exceed about 600MB/s sustained writes with the
> > effective bandwidth of 4 HDDs.
> > Smaller writes/reads that can be cached by RAM, DB, onboard caches on the
> > HDDs of course can and will be faster.
> >
> > But again, you're missing the point, even if you get 600MB/s writes out of
> > your cluster, the number of 4k IOPS will be much more relevant to your VMs.
> >
> >
> hdparm shows about 230MB/s:
>
> ^Croot@virt2:~# hdparm -Tt /dev/sda
>
> /dev/sda:
> Timing cached reads: 20250 MB in 2.00 seconds = 10134.81 MB/sec
> Timing buffered disk reads: 680 MB in 3.00 seconds = 226.50 MB/sec
>
That's read and a very optimized sequential one at that.
>
>
> 600MB/s would be super nice, but in reality even 400MB/s would be nice.
Do you really need to write that amount of data in a short time?
Typical VMs are IOPS bound, as pointed out several times.
We have 10x physical servers which are quite busy and two of them are slow in terms of disk speed so I am looking at getting better performance.
> Would it not be achievable?
>
Maybe, but you need to find out what, if anything makes your cluster
slower than this.
iostat, atop, etc can help with that.
How busy are your CPUs, HDDs and SSDs when you run that benchmark?
The CPU and RAM is fairly "idle" during any of my tests.
>
>
> > >
> > >
> > > On this, "If an OSD has no fast WAL/DB, it will drag the overall speed
> > > down. Verify and if so fix this and re-test.": how?
> > >
> > No idea, I don't do bluestore.
> > You noticed the lack of a WAL/DB for sda, go and fix it.
> > If in in doubt by destroying and re-creating.
> >
> > And if you're looking for a less invasive procedure, docs and the ML
> > archive, but AFAIK there is nothing but re-creation at this time.
> >
>
>
> Since I use Proxmox, which setup a DB device, but not a WAL device.
>
Again, I don't do bluestore.
But AFAIK, WAL will live on the fastest device, which is the SSD you've
put the DB on, unless specified separately.
So nothing to be done here.
I have re-created the CEPH pool with a DB and WAL device this time and performance is slightly better:
root@virt2:~# ceph-disk list | grep /dev/sdf | grep osd
/dev/sdb1 ceph data, active, cluster ceph, osd.5, block /dev/sdb2, block.db /dev/sdf1, block.wal /dev/sdf2
/dev/sdd1 ceph data, active, cluster ceph, osd.7, block /dev/sdd2, block.db /dev/sdf3, block.wal /dev/sdf4
root@virt2:~# ceph-disk list | grep /dev/sde | grep osd
/dev/sda1 ceph data, active, cluster ceph, osd.4, block /dev/sda2, block.db /dev/sde1, block.wal /dev/sde2
/dev/sdc1 ceph data, active, cluster ceph, osd.6, block /dev/sdc2, block.db /dev/sde3, block.wal /dev/sde4
root@virt2:~# rados bench -p Data 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 311 295 1179.73 1180 0.0498938 0.0520793
2 16 622 606 1211.78 1244 0.0358 0.0511329
3 16 934 918 1223.8 1248 0.0587524 0.0506744
Total time run: 3.420127
Total reads made: 986
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1153.17
Average IOPS: 288
Stddev IOPS: 9
Max IOPS: 312
Min IOPS: 295
Average Latency(s): 0.053413
Max latency(s): 0.284069
Min latency(s): 0.0166523
root@virt2:~# rados bench -p Data 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 381 365 1459.69 1460 0.00267135 0.04159
2 15 715 700 1399.75 1340 0.0934119 0.0441607
3 15 1079 1064 1418.44 1456 0.00258879 0.0435526
4 16 1448 1432 1431.77 1472 0.134513 0.0435446
5 16 1862 1846 1476.56 1656 0.017519 0.042301
6 16 2192 2176 1450.44 1320 0.00885603 0.0427858
7 16 2558 2542 1452.35 1464 0.00184139 0.0429065
8 16 2996 2980 1489.78 1752 0.0103593 0.04178
9 16 3385 3369 1497.12 1556 0.00866541 0.041612
10 16 3744 3728 1490.99 1436 0.00246718 0.0420014
Total time run: 10.204271
Total reads made: 3744
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1467.62
Average IOPS: 366
Stddev IOPS: 33
Max IOPS: 438
Min IOPS: 330
Average Latency(s): 0.0427017
Max latency(s): 0.453643
Min latency(s): 0.00143035
root@virt2:~# rados bench -p Data 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_virt2_20816
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 106 90 359.981 360 0.211947 0.164055
2 16 202 186 371.956 384 0.101829 0.161727
3 16 312 296 394.616 440 0.142682 0.157926
4 16 414 398 397.946 408 0.17893 0.157207
5 16 515 499 399.147 404 0.138521 0.157384
6 16 609 593 395.281 376 0.197496 0.159185
7 16 703 687 392.521 376 0.148057 0.160965
8 16 796 780 389.952 372 0.360846 0.161464
9 16 907 891 395.951 444 0.0697599 0.160687
10 16 989 973 389.153 328 0.164584 0.161334
Total time run: 10.125151
Total writes made: 990
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 391.105
Stddev Bandwidth: 35.6302
Max bandwidth (MB/sec): 444
Min bandwidth (MB/sec): 328
Average IOPS: 97
Stddev IOPS: 8
Max IOPS: 111
Min IOPS: 82
Average Latency(s): 0.163488
Stddev Latency(s): 0.0623322
Max latency(s): 0.451163
Min latency(s): 0.0416428
As noted the IOPS is still very very low. What could cause that?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com