Debug setting defaults are using..like 1/5 and 0/5 for almost.. Shall I try with 0 for all debug settings? On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius <daznis@xxxxxxxxx> wrote: > > Hello, > > > Check your CPU usage when you are doing those kind of operations. We > had a similar issue where our CPU monitoring was reporting fine < 40% > usage, but our load on the nodes was high mid 60-80. If it's possible > try disabling ht and see the actual cpu usage. > If you are hitting CPU limits you can try disabling crc on messages. > ms_nocrc > ms_crc_data > ms_crc_header > > And setting all your debug messages to 0. > If you haven't done you can also lower your recovery settings a little. > osd recovery max active > osd max backfills > > You can also lower your file store threads. > filestore op threads > > > If you can also switch to bluestore from filestore. This will also > lower your CPU usage. I'm not sure that this is bluestore that does > it, but I'm seeing lower cpu usage when moving to bluestore + rocksdb > compared to filestore + leveldb . > > > On Wed, Feb 20, 2019 at 4:27 PM M Ranga Swami Reddy > <swamireddy@xxxxxxxxx> wrote: > > > > Thats expected from Ceph by design. But in our case, we are using all > > recommendation like rack failure domain, replication n/w,etc, still > > face client IO performance issues during one OSD down.. > > > > On Tue, Feb 19, 2019 at 10:56 PM David Turner <drakonstein@xxxxxxxxx> wrote: > > > > > > With a RACK failure domain, you should be able to have an entire rack powered down without noticing any major impact on the clients. I regularly take down OSDs and nodes for maintenance and upgrades without seeing any problems with client IO. > > > > > > On Tue, Feb 12, 2019 at 5:01 AM M Ranga Swami Reddy <swamireddy@xxxxxxxxx> wrote: > > >> > > >> Hello - I have a couple of questions on ceph cluster stability, even > > >> we follow all recommendations as below: > > >> - Having separate replication n/w and data n/w > > >> - RACK is the failure domain > > >> - Using SSDs for journals (1:4ratio) > > >> > > >> Q1 - If one OSD down, cluster IO down drastically and customer Apps impacted. > > >> Q2 - what is stability ratio, like with above, is ceph cluster > > >> workable condition, if one osd down or one node down,etc. > > >> > > >> Thanks > > >> Swami > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@xxxxxxxxxxxxxx > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com