Norman,
I'm cc-ing this back to ceph-users for others the reply to or in future
to find
On 21/08/2018 12:01, Norman Gray wrote:
Willem Jan, hello.
Thanks for your detailed notes on my list question.
On 20 Aug 2018, at 21:32, Willem Jan Withagen wrote:
# zpool create -m/var/lib/ceph/osd/osd.0 osd.0 gpt/zd000 gpt/zd001
Over the weekend I update the Ceph manual for FreeBSD manual, with
exactly that.
I 'm not sure what sort of devices zd000 and zd001 are, but concating
devices seriously lowers the MTBF for the vdev. And as such it is
likely better to create 2 OSDs on these 2 devices.
My sort-of problem is that the machine I'm doing this on was not specced
with Ceph in mind: it has 16 3.5TB disks. Given that
<http://docs.ceph.com/docs/master/start/hardware-recommendations/>
suggests that 20 is a 'high' number of OSDs on a host, I thought it
might be better to aim for an initial setup of 6 two-disk OSDs rather
than 12 one-disk ones (leaving four disks free).
That said, 12 < 20, so I think that, especially bearing in mind your
advice here, I should probably stick to 1-disk OSDs with one (default)
5GB SSD journal each, and not complicate things.
Only one way to find out: try both...
But I certainly do not advise to put concat disks in an OSD. Especially
not for production. Break one disk, you break the vdev.
And the most important thing for OSDs is 1G per 1T of disk.
So with 70T of disk you'd need 64 or more of RAM, preferably more since
ZFS will want his share as well..
CPUs there is not going to that much of a issue. Unless you have real
tiny CPUs.
What I still have not figured out is what to do with the SSDs.
There are 3 things you can do (or in any combination)
1) Ceph standard: make it a journal. Mount the SSD on a separate dir and
get ceph-disk to start using it as journal
2) Attach a ZFS cache to the vdev which will improve reading
3) Attach a ZFS log on SSD to the vdev to improve sync writing.
At the moment I'm doing all three:
[~] wjw@xxxxxxxxxxxxxxxxxxxx> zfs list
NAME USED AVAIL REFER MOUNTPOINT
osd.0.journal 316K 5.33G 88K
/usr/jails/ceph_0/var/lib/ceph/osd/osd.0/journal-ssd
osd.1.journal 316K 5.33G 88K
/usr/jails/ceph_1/var/lib/ceph/osd/osd.1/journal-ssd
osd.2.journal 316K 5.33G 88K
/usr/jails/ceph_2/var/lib/ceph/osd/osd.2/journal-ssd
osd.3.journal 316K 5.33G 88K
/usr/jails/ceph_3/var/lib/ceph/osd/osd.3/journal-ssd
osd.4.journal 316K 5.33G 88K
/usr/jails/ceph_4/var/lib/ceph/osd/osd.4/journal-ssd
osd.5.journal 316K 5.33G 88K
/usr/jails/ceph_0/var/lib/ceph/osd/osd.5/journal-ssd
osd.6.journal 316K 5.33G 88K
/usr/jails/ceph_1/var/lib/ceph/osd/osd.6/journal-ssd
osd.7.journal 316K 5.33G 88K
/usr/jails/ceph_2/var/lib/ceph/osd/osd.7/journal-ssd
osd_0 5.16G 220G 5.16G
/usr/jails/ceph_0/var/lib/ceph/osd/osd.0
osd_1 5.34G 219G 5.34G
/usr/jails/ceph_1/var/lib/ceph/osd/osd.1
osd_2 5.42G 219G 5.42G
/usr/jails/ceph_2/var/lib/ceph/osd/osd.2
osd_3 6.62G 1.31T 6.62G
/usr/jails/ceph_3/var/lib/ceph/osd/osd.3
osd_4 6.83G 1.75T 6.83G
/usr/jails/ceph_4/var/lib/ceph/osd/osd.4
osd_5 5.92G 1.31T 5.92G
/usr/jails/ceph_0/var/lib/ceph/osd/osd.5
osd_6 6.00G 1.31T 6.00G
/usr/jails/ceph_1/var/lib/ceph/osd/osd.6
osd_7 6.10G 1.31T 6.10G
/usr/jails/ceph_2/var/lib/ceph/osd/osd.7
[~] wjw@xxxxxxxxxxxxxxxxxxxx> zpool list -v osd_1
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP
DEDUP HEALTH ALTROOT
osd_1 232G 5.34G 227G - - 0% 2%
1.00x ONLINE -
gpt/osd_1 232G 5.34G 227G - - 0% 2%
log - - - - - -
gpt/osd.1.log 960M 12K 960M - - 0% 0%
cache - - - - - -
gpt/osd.1.cache 22.0G 1.01G 21.0G - - 0% 4%
So each OSD has a SSD journal (zfs volume) and each osd volume has cache
and log. ATM the cluster is idle, so hence the log is "empty"
But I would first work on the architecture of how you want the cluster
to be, and then start tuning. fs log and cache are easily added and
removed after the fact.
I found what appear to be a couple of typos in your script which I can
report back to you. I hope to make significant progress with this work
this week, so should be able to give you more feedback on the script, on
my experiences, and on the FreeBSD page in the manual.
Sure, keep'm coming
--WjW
I'll work through your various notes. Below are a couple of specific
points.
When I attempt to start the service, I get:
# service ceph start
=== mon.pochhammer ===
You're sort of free to pick names, but most of the times tooling
expects naming converntions:
mon: mon.[a-z]
osd: osd.[0-9]+
mgr: mgr.[x-z]
Using other names should work, but I'm not sure it works for all cases.
Thanks! I wasn't sure if the restricted naming was just for demo
purposes. It's valuable to know that this is very firm advice.
Could also be permission thing. Most daemons used to run as root, but
"recently" they started running as user ceph:ceph
Yes, I had to change ownership of a couple of files before getting this
far.
My mon.a directory looks like:
Aha!
Yup, it is an overwhelming set of tools, with little begin or end.
I hadn't planned to be particularly Brave, here. But onward...
Best wishes,
Norman
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com