Re: ceph configuration; Was: FreeBSD rc.d script: sta.rt not found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Norman,

I'm cc-ing this back to ceph-users for others the reply to or in future to find

On 21/08/2018 12:01, Norman Gray wrote:

Willem Jan, hello.

Thanks for your detailed notes on my list question.

On 20 Aug 2018, at 21:32, Willem Jan Withagen wrote:

     # zpool create -m/var/lib/ceph/osd/osd.0 osd.0 gpt/zd000 gpt/zd001

Over the weekend I update the Ceph manual for FreeBSD manual, with exactly that. I 'm not sure what sort of devices zd000 and zd001 are, but concating devices seriously lowers the MTBF for the vdev. And as such it is likely better to create 2 OSDs on these 2 devices.

My sort-of problem is that the machine I'm doing this on was not specced with Ceph in mind: it has 16 3.5TB disks.  Given that <http://docs.ceph.com/docs/master/start/hardware-recommendations/> suggests that 20 is a 'high' number of OSDs on a host, I thought it might be better to aim for an initial setup of 6 two-disk OSDs rather than 12 one-disk ones (leaving four disks free).

That said, 12 < 20, so I think that, especially bearing in mind your advice here, I should probably stick to 1-disk OSDs with one (default) 5GB SSD journal each, and not complicate things.

Only one way to find out: try both...
But I certainly do not advise to put concat disks in an OSD. Especially not for production. Break one disk, you break the vdev.

And the most important thing for OSDs is 1G per 1T of disk.
So with 70T of disk you'd need 64 or more of RAM, preferably more since ZFS will want his share as well.. CPUs there is not going to that much of a issue. Unless you have real tiny CPUs.

What I still have not figured out is what to do with the SSDs.
There are 3 things you can do (or in any combination)
1) Ceph standard: make it a journal. Mount the SSD on a separate dir and
	get ceph-disk to start using it as journal
2) Attach a ZFS cache to the vdev which will improve reading
3) Attach a ZFS log on SSD to the vdev to improve sync writing.

At the moment I'm doing all three:
[~] wjw@xxxxxxxxxxxxxxxxxxxx> zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
osd.0.journal 316K 5.33G 88K /usr/jails/ceph_0/var/lib/ceph/osd/osd.0/journal-ssd osd.1.journal 316K 5.33G 88K /usr/jails/ceph_1/var/lib/ceph/osd/osd.1/journal-ssd osd.2.journal 316K 5.33G 88K /usr/jails/ceph_2/var/lib/ceph/osd/osd.2/journal-ssd osd.3.journal 316K 5.33G 88K /usr/jails/ceph_3/var/lib/ceph/osd/osd.3/journal-ssd osd.4.journal 316K 5.33G 88K /usr/jails/ceph_4/var/lib/ceph/osd/osd.4/journal-ssd osd.5.journal 316K 5.33G 88K /usr/jails/ceph_0/var/lib/ceph/osd/osd.5/journal-ssd osd.6.journal 316K 5.33G 88K /usr/jails/ceph_1/var/lib/ceph/osd/osd.6/journal-ssd osd.7.journal 316K 5.33G 88K /usr/jails/ceph_2/var/lib/ceph/osd/osd.7/journal-ssd osd_0 5.16G 220G 5.16G /usr/jails/ceph_0/var/lib/ceph/osd/osd.0 osd_1 5.34G 219G 5.34G /usr/jails/ceph_1/var/lib/ceph/osd/osd.1 osd_2 5.42G 219G 5.42G /usr/jails/ceph_2/var/lib/ceph/osd/osd.2 osd_3 6.62G 1.31T 6.62G /usr/jails/ceph_3/var/lib/ceph/osd/osd.3 osd_4 6.83G 1.75T 6.83G /usr/jails/ceph_4/var/lib/ceph/osd/osd.4 osd_5 5.92G 1.31T 5.92G /usr/jails/ceph_0/var/lib/ceph/osd/osd.5 osd_6 6.00G 1.31T 6.00G /usr/jails/ceph_1/var/lib/ceph/osd/osd.6 osd_7 6.10G 1.31T 6.10G /usr/jails/ceph_2/var/lib/ceph/osd/osd.7

[~] wjw@xxxxxxxxxxxxxxxxxxxx> zpool list -v osd_1
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT osd_1 232G 5.34G 227G - - 0% 2% 1.00x ONLINE -
  gpt/osd_1         232G  5.34G   227G        -         -     0%     2%
log                    -      -      -         -      -      -
  gpt/osd.1.log     960M    12K   960M        -         -     0%     0%
cache                  -      -      -         -      -      -
  gpt/osd.1.cache  22.0G  1.01G  21.0G        -         -     0%     4%

So each OSD has a SSD journal (zfs volume) and each osd volume has cache and log. ATM the cluster is idle, so hence the log is "empty"

But I would first work on the architecture of how you want the cluster to be, and then start tuning. fs log and cache are easily added and removed after the fact.

I found what appear to be a couple of typos in your script which I can report back to you.  I hope to make significant progress with this work this week, so should be able to give you more feedback on the script, on my experiences, and on the FreeBSD page in the manual.

Sure, keep'm coming

--WjW


I'll work through your various notes.  Below are a couple of specific points.

When I attempt to start the service, I get:

# service ceph start
=== mon.pochhammer ===

You're sort of free to pick names, but most of the times tooling expects naming converntions:
    mon: mon.[a-z]
    osd: osd.[0-9]+
    mgr: mgr.[x-z]

Using other names should work, but I'm not sure it works for all cases.

Thanks!  I wasn't sure if the restricted naming was just for demo purposes.  It's valuable to know that this is very firm advice.

Could also be permission thing. Most daemons used to run as root, but "recently" they started running as user ceph:ceph

Yes, I had to change ownership of a couple of files before getting this far.

My mon.a directory looks like:

Aha!

Yup, it is an overwhelming set of tools, with little begin or end.

I hadn't planned to be particularly Brave, here.  But onward...

Best wishes,

Norman



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux