The main difference is that rados bench uses 4MB objects while your dd test uses 4k block size rados bench shows an average of 283 IOPS which at 4k blocksize would be around 1.1MB so it's somewhat consistent with the dd result Monitor your CPU usage, network latency with something like atop on the OSD nodes and check what might be causing the problem On Wed, Dec 27, 2017 at 7:31 AM, kevin parrikar <kevin.parker092@xxxxxxxxx> wrote: > Hi All, > I upgraded my cluster from Hammer to Jewel and then to Luminous , changed > from filestore to bluestore backend. > > on a KVM vm with 4 cpu /2 Gb RAM i have attached a 20gb rbd volume as vdc > and performed following test. > > dd if=/dev/zero of=/dev/vdc bs=4k count=1000 oflag=direct > 1000+0 records in > 1000+0 records out > 4096000 bytes (4.1 MB) copied, 3.08965 s, 1.3 MB/s > > and its consistently giving 1.3MB/s which i feel is too low.I have 3 ceph > osd nodes each with 24 x15k RPM with a replication of 2 ,connected 2x10G > LACP bonded NICs with an MTU of 9100. > > Rados Bench results: > > rados bench -p volumes 4 write > hints = 1 > Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 > for up to 4 seconds or 0 objects > Object prefix: benchmark_data_ceph3.sapiennetworks.com_820994 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 0 0 0 0 0 0 - > 0 > 1 16 276 260 1039.98 1040 0.0165053 > 0.0381299 > 2 16 545 529 1057.92 1076 0.043151 > 0.0580376 > 3 16 847 831 1107.91 1208 0.0394811 > 0.0567684 > 4 16 1160 1144 1143.9 1252 0.63265 > 0.0541888 > Total time run: 4.099801 > Total writes made: 1161 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 1132.74 > Stddev Bandwidth: 101.98 > Max bandwidth (MB/sec): 1252 > Min bandwidth (MB/sec): 1040 > Average IOPS: 283 > Stddev IOPS: 25 > Max IOPS: 313 > Min IOPS: 260 > Average Latency(s): 0.0560897 > Stddev Latency(s): 0.107352 > Max latency(s): 1.02123 > Min latency(s): 0.00920514 > Cleaning up (deleting benchmark objects) > Removed 1161 objects > Clean up completed and total clean up time :0.079850 > > > After upgrading to Luminous i have executed > > ceph osd crush tunables optimal > > ceph.conf > > [global] > fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf > mon_initial_members = node-16 node-30 node-31 > mon_host = 172.16.1.9 172.16.1.3 172.16.1.11 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > log_to_syslog_level = info > log_to_syslog = True > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 64 > public_network = 172.16.1.0/24 > log_to_syslog_facility = LOG_LOCAL0 > osd_journal_size = 2048 > auth_supported = cephx > osd_pool_default_pgp_num = 64 > osd_mkfs_type = xfs > cluster_network = 172.16.1.0/24 > osd_recovery_max_active = 1 > osd_max_backfills = 1 > max_open_files = 131072 > debug_default = False > > > [client] > rbd_cache_writethrough_until_flush = True > rbd_cache = True > > [client.radosgw.gateway] > rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator > keyring = /etc/ceph/keyring.radosgw.gateway > rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1 > rgw_socket_path = /tmp/radosgw.sock > rgw_keystone_revocation_interval = 1000000 > rgw_keystone_url = http://192.168.1.3:35357 > rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R > rgw_init_timeout = 360000 > host = controller2 > rgw_dns_name = *.sapiennetworks.com > rgw_print_continue = True > rgw_keystone_token_cache_size = 10 > rgw_data = /var/lib/ceph/radosgw > user = www-data > > [osd] > journal_queue_max_ops = 3000 > objecter_inflight_ops = 10240 > journal_queue_max_bytes = 1048576000 > filestore_queue_max_ops = 500 > osd_mkfs_type = xfs > osd_mount_options_xfs = rw,relatime,inode64,logbsize=256k,allocsize=4M > osd_op_threads = 20 > filestore_queue_committing_max_ops = 5000 > journal_max_write_entries = 1000 > objecter_infilght_op_bytes = 1048576000 > filestore_queue_max_bytes = 1048576000 > filestore_max_sync_interval = 10 > journal_max_write_bytes = 1048576000 > filestore_queue_committing_max_bytes = 1048576000 > ms_dispatch_throttle_bytes = 1048576000 > > ceph -s > cluster: > id: 06c5c906-fc43-499f-8a6f-6c8e21807acf > health: HEALTH_WARN > application not enabled on 2 pool(s) > > services: > mon: 3 daemons, quorum controller3,controller2,controller1 > mgr: controller1(active) > osd: 72 osds: 72 up, 72 in > rgw: 1 daemon active > > data: > pools: 5 pools, 6240 pgs > objects: 12732 objects, 72319 MB > usage: 229 GB used, 39965 GB / 40195 GB avail > pgs: 6240 active+clean > > can some one suggest a way to improve this. > > Thanks, > > Kevin > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com