Re: thanks for a double check on ceph's config

Geocast Networks <info@xxxxxxxxxxx> · Tue, 10 May 2016 16:50:17 +0800

Hello Chris,

We don't use SSD as journal.
each host has one intel E5-2620 CPU which is 6 cores.
the networking (both cluster and data networks) is 10Gbps.

My further questions include,

(1) osd_mkfs_type = xfs

    osd_mkfs_options_xfs = -f

    filestore_xattr_use_omap = true

for XFS filesystem, we should not enable filestore_xattr_use_omap = true, is it?

(2) filestore_queue_max_ops = 25000

    filestore_queue_max_bytes = 10485760

    filestore_queue_committing_max_ops = 5000

    filestore_queue_committing_max_bytes = 10485760000

    journal_max_write_bytes = 1073714824

    journal_max_write_entries = 10000

    journal_queue_max_ops = 50000

    journal_queue_max_bytes = 10485760000

Since we don't have SSD as journals, all these setup are too large? what are the better values?

(3) osd_mount_options_xfs =
    "rw,noexec,nodev,noatime,nodiratime,nobarrier"
What's your suggested options here?

Thanks a lot.

2016-05-10 15:31 GMT+08:00 Christian Balzer <chibi@xxxxxxx>:
On Tue, 10 May 2016 11:48:07 +0800 Geocast wrote:

Hello,

> We have 21 hosts for ceph OSD servers, each host has 12 SATA disks (4TB

> each), 64GB memory.

No journal SSDs?

What CPU(s) and network?

> ceph version 10.2.0, Ubuntu 16.04 LTS

> The whole cluster is new installed.

>

> Can you help check what the arguments we put in ceph.conf is reasonable

> or not?

> thanks.

>

> [osd]

> osd_data = /var/lib/ceph/osd/ceph-$id

> osd_journal_size = 20000

Overkill most likely, but not an issue.

> osd_mkfs_type = xfs

> osd_mkfs_options_xfs = -f

> filestore_xattr_use_omap = true

> filestore_min_sync_interval = 10

Are you aware what this does and have you actually tested this (IOPS AND

throughput) with various other setting on your hardware to arrive at this

number?

> filestore_max_sync_interval = 15

That's fine in and by itself, unlikely to ever be reached anyway.

> filestore_queue_max_ops = 25000

> filestore_queue_max_bytes = 10485760

> filestore_queue_committing_max_ops = 5000

> filestore_queue_committing_max_bytes = 10485760000

> journal_max_write_bytes = 1073714824

> journal_max_write_entries = 10000

> journal_queue_max_ops = 50000

> journal_queue_max_bytes = 10485760000

Same as above, have you tested these setting (from filestore_queue_max_ops

onward) compared to the defaults?

With HDDs only I'd expect any benefits to be small and/or things to become

very uneven once the HDDs are saturated.

> osd_max_write_size = 512

> osd_client_message_size_cap = 2147483648

> osd_deep_scrub_stride = 131072

> osd_op_threads = 8

> osd_disk_threads = 4

> osd_map_cache_size = 1024

> osd_map_cache_bl_size = 128

> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"

The nobarrier part is a a potential recipe for disaster unless you have all

on-disk caches disabled and every other cache battery backed.

The only devices I trust to mount nobarrier are SSDs with powercaps that

have been proven to do the right thing (Intel DC S amongst them).

> osd_recovery_op_priority = 4

> osd_recovery_max_active = 10

> osd_max_backfills = 4

>

That's sane enough.

> [client]

> rbd_cache = true

AFAIK that's the case with recent Ceph versions anyway.

> rbd_cache_size = 268435456

Are you sure that you have 256MB per client to waste on RBD cache?

If so, bully for you, but you might find that depending on your use case a

smaller RBD cache but more VM memory (for pagecache, SLAB, etc) could be

more beneficial.

> rbd_cache_max_dirty = 134217728

> rbd_cache_max_dirty_age = 5

Christian

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com