Re: thanks for a double check on ceph's config

Christian Balzer <chibi@xxxxxxx> · Tue, 10 May 2016 16:31:04 +0900

On Tue, 10 May 2016 11:48:07 +0800 Geocast wrote:

Hello,

> We have 21 hosts for ceph OSD servers, each host has 12 SATA disks (4TB
> each), 64GB memory.
No journal SSDs? 
What CPU(s) and network?

> ceph version 10.2.0, Ubuntu 16.04 LTS
> The whole cluster is new installed.
> 
> Can you help check what the arguments we put in ceph.conf is reasonable
> or not?
> thanks.
> 
> [osd]
> osd_data = /var/lib/ceph/osd/ceph-$id
> osd_journal_size = 20000
Overkill most likely, but not an issue.

> osd_mkfs_type = xfs
> osd_mkfs_options_xfs = -f
> filestore_xattr_use_omap = true
> filestore_min_sync_interval = 10
Are you aware what this does and have you actually tested this (IOPS AND
throughput) with various other setting on your hardware to arrive at this
number?

> filestore_max_sync_interval = 15
That's fine in and by itself, unlikely to ever be reached anyway.

> filestore_queue_max_ops = 25000
> filestore_queue_max_bytes = 10485760
> filestore_queue_committing_max_ops = 5000
> filestore_queue_committing_max_bytes = 10485760000
> journal_max_write_bytes = 1073714824
> journal_max_write_entries = 10000
> journal_queue_max_ops = 50000
> journal_queue_max_bytes = 10485760000
Same as above, have you tested these setting (from filestore_queue_max_ops
onward) compared to the defaults?

With HDDs only I'd expect any benefits to be small and/or things to become
very uneven once the HDDs are saturated. 

> osd_max_write_size = 512
> osd_client_message_size_cap = 2147483648
> osd_deep_scrub_stride = 131072
> osd_op_threads = 8
> osd_disk_threads = 4
> osd_map_cache_size = 1024
> osd_map_cache_bl_size = 128
> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
The nobarrier part is a a potential recipe for disaster unless you have all
on-disk caches disabled and every other cache battery backed.

The only devices I trust to mount nobarrier are SSDs with powercaps that
have been proven to do the right thing (Intel DC S amongst them). 

> osd_recovery_op_priority = 4
> osd_recovery_max_active = 10
> osd_max_backfills = 4
> 
That's sane enough. 

> [client]
> rbd_cache = true
AFAIK that's the case with recent Ceph versions anyway.

> rbd_cache_size = 268435456

Are you sure that you have 256MB per client to waste on RBD cache?
If so, bully for you, but you might find that depending on your use case a
smaller RBD cache but more VM memory (for pagecache, SLAB, etc) could be
more beneficial. 

> rbd_cache_max_dirty = 134217728
> rbd_cache_max_dirty_age = 5

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com