>From different search results I read, disabling cephx can help. Also https://static.linaro.org/connect/san19/presentations/san19-120.pdf recommended some settings changes for the bluestore cache. [osd] bluestore cache autotune = 0 bluestore_cache_kv_ratio = 0.2 bluestore_cache_meta_ratio = 0.8 bluestore rocksdb options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,write_buffer_size=64M,compaction_readahead_size=2M bluestore_cache_size_hdd = 536870912 # This is size of the Cache on the HDD osd_min_pg_log_entries = 10 osd_max_pg_log_entries = 10 osd_pg_log_dups_tracked = 10 osd_pg_log_trim_min = 10 But nothing changed much. It looks like it is mostly with the small files, when I tested the same with 128k or even 64k block size, the results were much better. Any suggestions? Thanks and Regards, Athreya On Tue, Nov 10, 2020 at 8:51 PM <athreyavc@xxxxxxxxx> wrote: > Hi, > > We have recently deployed a Ceph cluster with > > 12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8 > 3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8 > > We are using Ceph Octopus and we are using RBD block devices. > > We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8) > across which RBDs are mapped and mounted, 25 RBDs each on each client node. > Each RBD size is 10TB. Each RBD is formatted as EXT4 file system. > > From network side, we have 10Gbps Active/Passive Bond on all the Ceph > cluster nodes, including the clients. Jumbo frames enabled and MTU is 9000 > > This is a new cluster and cluster health reports Ok. But we see high IO > wait during the writes. > > From one of the clients, > > 15:14:30 CPU %user %nice %system %iowait %steal > %idle > 15:14:31 all 0.06 0.00 1.00 45.03 0.00 > 53.91 > 15:14:32 all 0.06 0.00 0.94 41.28 0.00 > 57.72 > 15:14:33 all 0.06 0.00 1.25 45.78 0.00 > 52.91 > 15:14:34 all 0.00 0.00 1.06 40.07 0.00 > 58.86 > 15:14:35 all 0.19 0.00 1.38 41.04 0.00 > 57.39 > Average: all 0.08 0.00 1.13 42.64 0.00 > 56.16 > > and the system load shows very high > > top - 15:19:15 up 34 days, 41 min, 2 users, load average: 13.49, 13.62, > 13.83 > > > From 'atop' > > one of the CPUs shows this > > CPU | sys 7% | user 1% | irq 2% | idle 1394% | wait > 195% | steal 0% | guest 0% | ipc initial | cycl initial | > curf 806MHz | curscal ?% > > On the OSD nodes, don't see much %utilization of the disks. > > RBD caching values are default. > > Are we overlooking some configuration item ? > > Thanks and Regards, > > At > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx