Re: slow 4k writes, Luminous with bluestore backend

Christian Wuerdig <christian.wuerdig@xxxxxxxxx> · Wed, 3 Jan 2018 10:09:32 +1300



The main difference is that rados bench uses 4MB objects while your dd
test uses 4k block size
rados bench shows an average of 283 IOPS which at 4k blocksize would
be around 1.1MB so it's somewhat consistent with the dd result
Monitor your CPU usage, network latency with something like atop on
the OSD nodes and check what might be causing the problem

On Wed, Dec 27, 2017 at 7:31 AM, kevin parrikar
<kevin.parker092@xxxxxxxxx> wrote:
> Hi All,
> I upgraded my cluster from Hammer to Jewel and then to Luminous , changed
> from filestore to bluestore backend.
>
> on a KVM vm with 4 cpu /2 Gb RAM i have attached a 20gb rbd volume as vdc
> and performed following test.
>
> dd if=/dev/zero of=/dev/vdc bs=4k count=1000 oflag=direct
> 1000+0 records in
> 1000+0 records out
> 4096000 bytes (4.1 MB) copied, 3.08965 s, 1.3 MB/s
>
> and its consistently giving 1.3MB/s which i feel is too low.I have 3 ceph
> osd nodes each with 24 x15k RPM with a replication of 2 ,connected 2x10G
> LACP bonded NICs with an MTU of 9100.
>
> Rados Bench results:
>
> rados bench -p volumes 4 write
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304
> for up to 4 seconds or 0 objects
> Object prefix: benchmark_data_ceph3.sapiennetworks.com_820994
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>     0       0         0         0         0         0           -
> 0
>     1      16       276       260   1039.98      1040   0.0165053
> 0.0381299
>     2      16       545       529   1057.92      1076    0.043151
> 0.0580376
>     3      16       847       831   1107.91      1208   0.0394811
> 0.0567684
>     4      16      1160      1144    1143.9      1252     0.63265
> 0.0541888
> Total time run:         4.099801
> Total writes made:      1161
> Write size:             4194304
> Object size:            4194304
> Bandwidth (MB/sec):     1132.74
> Stddev Bandwidth:       101.98
> Max bandwidth (MB/sec): 1252
> Min bandwidth (MB/sec): 1040
> Average IOPS:           283
> Stddev IOPS:            25
> Max IOPS:               313
> Min IOPS:               260
> Average Latency(s):     0.0560897
> Stddev Latency(s):      0.107352
> Max latency(s):         1.02123
> Min latency(s):         0.00920514
> Cleaning up (deleting benchmark objects)
> Removed 1161 objects
> Clean up completed and total clean up time :0.079850
>
>
> After upgrading to Luminous i have executed
>
> ceph osd crush tunables optimal
>
> ceph.conf
>
> [global]
> fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf
> mon_initial_members = node-16 node-30 node-31
> mon_host = 172.16.1.9 172.16.1.3 172.16.1.11
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 172.16.1.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 2048
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 172.16.1.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> max_open_files = 131072
> debug_default = False
>
>
> [client]
> rbd_cache_writethrough_until_flush = True
> rbd_cache = True
>
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
> rgw_socket_path = /tmp/radosgw.sock
> rgw_keystone_revocation_interval = 1000000
> rgw_keystone_url = http://192.168.1.3:35357
> rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R
> rgw_init_timeout = 360000
> host = controller2
> rgw_dns_name = *.sapiennetworks.com
> rgw_print_continue = True
> rgw_keystone_token_cache_size = 10
> rgw_data = /var/lib/ceph/radosgw
> user = www-data
>
> [osd]
> journal_queue_max_ops = 3000
> objecter_inflight_ops = 10240
> journal_queue_max_bytes = 1048576000
> filestore_queue_max_ops = 500
> osd_mkfs_type = xfs
> osd_mount_options_xfs = rw,relatime,inode64,logbsize=256k,allocsize=4M
> osd_op_threads = 20
> filestore_queue_committing_max_ops = 5000
> journal_max_write_entries = 1000
> objecter_infilght_op_bytes = 1048576000
> filestore_queue_max_bytes = 1048576000
> filestore_max_sync_interval = 10
> journal_max_write_bytes = 1048576000
> filestore_queue_committing_max_bytes = 1048576000
> ms_dispatch_throttle_bytes = 1048576000
>
>  ceph -s
>   cluster:
>     id:     06c5c906-fc43-499f-8a6f-6c8e21807acf
>     health: HEALTH_WARN
>             application not enabled on 2 pool(s)
>
>   services:
>     mon: 3 daemons, quorum controller3,controller2,controller1
>     mgr: controller1(active)
>     osd: 72 osds: 72 up, 72 in
>     rgw: 1 daemon active
>
>   data:
>     pools:   5 pools, 6240 pgs
>     objects: 12732 objects, 72319 MB
>     usage:   229 GB used, 39965 GB / 40195 GB avail
>     pgs:     6240 active+clean
>
> can some one suggest a way to improve this.
>
> Thanks,
>
> Kevin
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com