Hi,
the Samsung PM1725b is definitely a good choice when
it comes to "lower" price enterprise SSDs. They cost pretty much
the same as the Samsung Pro SSDs but offer way higher DWPD and
power loss protection.
My benchmarks of the 3.2TB version in a PCIe 2.0 slot (the card
is 3.0!)
fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
write: IOPS=154k, BW=601MiB/s (630MB/s)(35.2GiB/60003msec)
fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
write: IOPS=679, BW=2717MiB/s (2849MB/s)(159GiB/60005msec)
Regards,
Georg
On 24.10.19 21:21, Martin Verges wrote:
Hello,
think about migrating to a way faster and better Ceph
version and towards bluestore to increase the performance with
the existing hardware.
If you want to go with PCIe card, the Samsung PM1725b can
provide quite good speeds but at much higher costs then the
EVO. If you want to check drives, take a look at the uncached
write latency. The lower the value is, the better will be the
drive.
Am Do., 24. Okt. 2019 um
21:09 Uhr schrieb Hermann Himmelbauer < hermann@xxxxxxx>:
Hi,
I am running a nice ceph (proxmox 4 / debian-8 / ceph 0.94.3)
cluster on
3 nodes (supermicro X8DTT-HIBQF), 2 OSD each (2TB SATA
harddisks),
interconnected via Infiniband 40.
Problem is that the ceph performance is quite bad (approx.
30MiB/s
reading, 3-4 MiB/s writing ), so I thought about plugging into
each node
a PCIe to NVMe/M.2 adapter and install SSD harddisks. The idea
is to
have a faster ceph storage and also some storage extension.
The question is now which SSDs I should use. If I understand
it right,
not every SSD is suitable for ceph, as is denoted at the links
below:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
or here:
https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark
In the first link, the Samsung SSD 950 PRO 512GB NVMe is
listed as a
fast SSD for ceph. As the 950 is not available anymore, I
ordered a
Samsung 970 1TB for testing, unfortunately, the "EVO" instead
of PRO.
Before equipping all nodes with these SSDs, I did some tests
with "fio"
as recommended, e.g. like this:
fio --filename=/dev/DEVICE --direct=1 --sync=1 --rw=write
--bs=4k
--numjobs=1 --iodepth=1 --runtime=60 --time_based
--group_reporting
--name=journal-test
The results are as the following:
-----------------------
1) Samsung 970 EVO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=26706MB, bw=445MiB/s, iops=113945, runt= 60001msec
write: io=252576KB, bw=4.1MiB/s, iops=1052, runt= 60001msec
Jobs: 4:
read : io=21805MB, bw=432.7MiB/s, iops=93034, runt= 60001msec
write: io=422204KB, bw=6.8MiB/s, iops=1759, runt= 60002msec
Jobs: 10:
read : io=26921MB, bw=448MiB/s, iops=114859, runt= 60001msec
write: io=435644KB, bw=7MiB/s, iops=1815, runt= 60004msec
-----------------------
So the read speed is impressive, but the write speed is really
bad.
Therefore I ordered the Samsung 970 PRO (1TB) as it has faster
NAND
chips (MLC instead of TLC). The results are, however even
worse for writing:
-----------------------
Samsung 970 PRO NVMe M.2 mit PCIe Adapter
Jobs: 1:
read : io=15570MB, bw=259.4MiB/s, iops=66430, runt= 60001msec
write: io=199436KB, bw=3.2MiB/s, iops=830, runt= 60001msec
Jobs: 4:
read : io=48982MB, bw=816.3MiB/s, iops=208986, runt= 60001msec
write: io=327800KB, bw=5.3MiB/s, iops=1365, runt= 60002msec
Jobs: 10:
read : io=91753MB, bw=1529.3MiB/s, iops=391474, runt=
60001msec
write: io=343368KB, bw=5.6MiB/s, iops=1430, runt= 60005msec
-----------------------
I did some research and found out, that the "--sync" flag sets
the flag
"O_DSYNC" which seems to disable the SSD cache which leads to
these
horrid write speeds.
It seems that this relates to the fact that the write cache is
only not
disabled for SSDs which implement some kind of battery buffer
that
guarantees a data flush to the flash in case of a powerloss.
However, It seems impossible to find out which SSDs do have
this
powerloss protection, moreover, these enterprise SSDs are
crazy
expensive compared to the SSDs above - moreover it's unclear
if
powerloss protection is even available in the NVMe form
factor. So
building a 1 or 2 TB cluster seems not really
affordable/viable.
So, can please anyone give me hints what to do? Is it possible
to ensure
that the write cache is not disabled in some way (my server is
situated
in a data center, so there will probably never be loss of
power).
Or is the link above already outdated as newer ceph releases
somehow
deal with this problem? Or maybe a later Debian release (10)
will handle
the O_DSYNC flag differently?
Perhaps I should simply invest in faster (and bigger)
harddisks and
forget the SSD-cluster idea?
Thank you in advance for any help,
Best Regards,
Hermann
--
hermann@xxxxxxx
PGP/GPG: 299893C7 (on keyservers)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
|