Re: Optimal or recommended threads values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Don't forget number of cores in the node. Basically you want enough threads to keep all of the cores busy while not having so many that you end up with a ton of context switching overhead. Also as you said there's a lot of other factors that may have an affect, like the number of AGs (assuming XFS), scheduler, HBA, etc. What I found a while back was that increasing the OSD op thread count to ~8 helped reads in some cases on the node I was testing, but could hurt write performance if increased too high. Increasing the other thread counts didn't make enough of a difference to be able to discern if they helped or hurt.

It may be different now though with all of the improvements that have gone in to Giant.

Mark

On 11/24/2014 06:33 PM, Craig Lewis wrote:
Tuning these values depends on a lot more than just the SSDs and HDDs.
Which kernel and IO scheduler are you using?  Does your HBA do write
caching?  It also depends on what your goals are.  Tuning for a RadosGW
cluster is different that for a RDB cluster.  The short answer is that
you are the only person that can can tell you what your optimal values
are.  As always, the best benchmark is production load.


In my small cluster (5 nodes, 44 osds), I'm optimizing to minimize
latency during recovery.  When the cluster is healthy, bandwidth and
latency are more than adequate for my needs.  Even with journals on
SSDs, I've found that reducing the number of operations and threads has
reduced my average latency.

I use injectargs to try out new values while I monitor cluster latency.
I monitor latency while the cluster is healthy and recovering.  If a
change is deemed better, only then will I persist the change to
ceph.conf.  This gives me a fallback that any changes that causes
massive problems can be undone with a restart or reboot.


So far, the configs that I've written to ceph.conf are
[global]
   mon osd down out interval = 900
   mon osd min down reporters = 9
   mon osd min down reports = 12
   osd pool default flag hashpspool = true

[osd]
   osd max backfills = 1
   osd recovery max active = 1
   osd recovery op priority = 1


I have it on my list to investigate filestore max sync interval.  And
now that I've pasted that, I need to revisit the min down
reports/reporters.  I have some nodes with 10 OSDs, and I don't want any
one node able to mark the rest of the cluster as down (it happened once).




On Sat, Nov 22, 2014 at 6:24 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx
<mailto:andrei@xxxxxxxxxx>> wrote:

    Hello guys,

    Could some one comment on the optimal or recommended values of
    various threads values in ceph.conf?

    At the moment I have the following settings:

    filestore_op_threads = 8
    osd_disk_threads = 8
    osd_op_threads = 8
    filestore_merge_threshold = 40
    filestore_split_multiple = 8

    Are these reasonable for a small cluster made of 7.2K SAS disks with
    ssd journals with a ratio of 4:1?

    What are the settings that other people are using?

    Thanks

    Andrei



    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux