slow 4k writes, Luminous with bluestore backend

kevin parrikar <kevin.parker092@xxxxxxxxx> · Wed, 27 Dec 2017 00:01:27 +0530

Hi All,
I upgraded my cluster from Hammer to Jewel and then to Luminous , changed from filestore to bluestore backend.

on a KVM vm with 4 cpu /2 Gb RAM i have attached a 20gb rbd volume as vdc and performed following test.

dd if=/dev/zero of=/dev/vdc bs=4k count=1000 oflag=direct
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB) copied, 3.08965 s, 1.3 MB/s

and its consistently giving 1.3MB/s which i feel is too low.I have 3 ceph osd nodes each with 24 x15k RPM with a replication of 2 ,connected 2x10G LACP bonded NICs with an MTU of 9100.

Rados Bench results:

rados bench -p volumes 4 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 4 seconds or 0 objects
Object prefix: benchmark_data_ceph3.sapiennetworks.com_820994
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       276       260   1039.98      1040   0.0165053   0.0381299
    2      16       545       529   1057.92      1076    0.043151   0.0580376
    3      16       847       831   1107.91      1208   0.0394811   0.0567684
    4      16      1160      1144    1143.9      1252     0.63265   0.0541888
Total time run:         4.099801
Total writes made:      1161
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     1132.74
Stddev Bandwidth:       101.98
Max bandwidth (MB/sec): 1252
Min bandwidth (MB/sec): 1040
Average IOPS:           283
Stddev IOPS:            25
Max IOPS:               313
Min IOPS:               260
Average Latency(s):     0.0560897
Stddev Latency(s):      0.107352
Max latency(s):         1.02123
Min latency(s):         0.00920514
Cleaning up (deleting benchmark objects)
Removed 1161 objects
Clean up completed and total clean up time :0.079850

After upgrading to Luminous i have executed 
ceph osd crush tunables optimal

ceph.conf

[global]
fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf
mon_initial_members = node-16 node-30 node-31
mon_host = 172.16.1.9 172.16.1.3 172.16.1.11
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
log_to_syslog_level = info
log_to_syslog = True
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 64
public_network = 172.16.1.0/24
log_to_syslog_facility = LOG_LOCAL0
osd_journal_size = 2048
auth_supported = cephx
osd_pool_default_pgp_num = 64
osd_mkfs_type = xfs
cluster_network = 172.16.1.0/24
osd_recovery_max_active = 1
osd_max_backfills = 1
max_open_files = 131072
debug_default = False

[client]
rbd_cache_writethrough_until_flush = True
rbd_cache = True

[client.radosgw.gateway]
rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
rgw_socket_path = /tmp/radosgw.sock
rgw_keystone_revocation_interval = 1000000
rgw_keystone_url = http://192.168.1.3:35357
rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R
rgw_init_timeout = 360000
host = controller2
rgw_dns_name = *.sapiennetworks.com
rgw_print_continue = True
rgw_keystone_token_cache_size = 10
rgw_data = /var/lib/ceph/radosgw
user = www-data

[osd]
journal_queue_max_ops = 3000
objecter_inflight_ops = 10240
journal_queue_max_bytes = 1048576000
filestore_queue_max_ops = 500
osd_mkfs_type = xfs
osd_mount_options_xfs = rw,relatime,inode64,logbsize=256k,allocsize=4M
osd_op_threads = 20
filestore_queue_committing_max_ops = 5000
journal_max_write_entries = 1000
objecter_infilght_op_bytes = 1048576000
filestore_queue_max_bytes = 1048576000
filestore_max_sync_interval = 10
journal_max_write_bytes = 1048576000
filestore_queue_committing_max_bytes = 1048576000
ms_dispatch_throttle_bytes = 1048576000

 ceph -s
  cluster:
    id:     06c5c906-fc43-499f-8a6f-6c8e21807acf
    health: HEALTH_WARN
            application not enabled on 2 pool(s)

  services:
    mon: 3 daemons, quorum controller3,controller2,controller1
    mgr: controller1(active)
    osd: 72 osds: 72 up, 72 in
    rgw: 1 daemon active

  data:
    pools:   5 pools, 6240 pgs
    objects: 12732 objects, 72319 MB
    usage:   229 GB used, 39965 GB / 40195 GB avail
    pgs:     6240 active+clean

can some one suggest a way to improve this.

Thanks,
Kevin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com