Ashley Merrick wrote:
Correct, in a large cluster no problem.
I was talking in Wladimir setup where they are running single node with
a failure domain of OSD. Which would be a loss of all OSD's and all data.
Sure I am aware that running with 1 NVMe is risky, so we have a plan to
add a mirroring NVMe to it in some future. Hope this could be solved by
using simple mdadm+lvm scheme
Btw, are there any recommendations on cheapest Ceph node hardware ? Now
I understand that 8x3TB HDDs in single host is quite a centralized
setup. And I have a feeling that a good Ceph cluster should have more
hosts than OSDs in each host. Like, with 8 OSDs per host, at least 8
hosts. Or at least 3 hosts with 3 OSDs in each. Right ? And then it
would be reasonable to add single NVMe per host to allow any component
of the host to fail within failure domain=host.
I am still thinking within the cheapest concept of multiple HDDs +
single NVMe per host.
---- On Sun, 22 Sep 2019 03:42:52 +0800 *solarflow99
<solarflow99@xxxxxxxxx <mailto:solarflow99@xxxxxxxxx>>* wrote ----
now my understanding is that a NVMe drive is recommended to help
speed up bluestore. If it were to fail then those OSDs would be
lost but assuming there is 3x replication and enough OSDs I don't
see the problem here. There are other scenarios where a whole
server might le lost, it doesn't mean the total loss of the cluster.
On Sat, Sep 21, 2019 at 5:27 AM Ashley Merrick
<singapore@xxxxxxxxxxxxxx <mailto:singapore@xxxxxxxxxxxxxx>> wrote:
__
Placing it as a Journal / Bluestore DB/WAL will help with writes
mostly, by the sounds of it you want to increase read
performance?, how important is the data on this CEPH cluster?
If you place it as a Journal DB/WAL any failure of it will cause
total data loss so I would very much advise against this unless
this is totally for testing and total data loss is not an issue.
In that can is worth upgrading to blue store by rebuilding each
OSD placing the DB/WAL on a SSD partition, you can do this one
OSD at a time but there is no migration path so you would need
to wait for data rebuilding after each OSD change before moving
onto the next.
If you need to make sure your data is safe then your really
limited to using it as a read only cache, but I think even then
most setups would cause all OSD's to go offline till you
manually removed it from a read only cache if the disk failed.
However bcache/dm-cache may support this automatically however
is still a risk that I personally wouldn't want to take.
Also it really depends on your use for CEPH and the I/O activity
expected to what the best option may be.
---- On Fri, 20 Sep 2019 14:56:12 +0800 *Wladimir Mutel
<mwg@xxxxxxxxx <mailto:mwg@xxxxxxxxx>>* wrote ----
Dear everyone,
Last year I set up an experimental Ceph cluster (still
single node,
failure domain = osd, MB Asus P10S-M WS, CPU Xeon E3-1235L,
RAM 64 GB,
HDDs WD30EFRX, Ubuntu 18.04, now with kernel 5.3.0 from
Ubuntu mainline
PPA and Ceph 14.2.4 from
download.ceph.com/debian-nautilus/dists/bionic
<http://download.ceph.com/debian-nautilus/dists/bionic>
). I set up JErasure 2+1 pool, created some RBDs using that
as data pool
and exported them by iSCSI (using tcmu-runner, gwcli and
associated
packages). But with HDD-only setup their performance was
less than
stellar, not saturating even 1Gbit Ethernet on RBD reads.
This year my experiment was funded with Gigabyte PCIe
NVMe 1TB SSD
(GP-ASACNE2100TTTDR). Now it is plugged in the MB and is
visible as a
storage device to lsblk. Also I can see its 4 interrupt
queues in
/proc/interrupts, and its transfer measured by hdparm -t is
about 2.3GB/sec.
And now I want to ask your advice on how to best
include it into this
already existing setup. Should I allocate it for OSD
journals and
databases ? Is there a way to reconfigure existing OSD in
this way
without destroying and recreating it ? Or are there plans to
ease this
kind of migration ? Can I add it as a write-adsorbing cache to
individual RBD images ? To individual block devices at the
level of
bcache/dm-cache ? What about speeding up RBD reads ?
I would appreciate to read your opinions and
recommendations.
(just want to warn you that in this situation I don't
have financial
option of going full-SSD)
Thank you all in advance for your response
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com