Re: speed decrease with size

Christian Balzer <chibi@xxxxxxx> · Mon, 13 Mar 2017 09:24:00 +0900

Hello,

On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:

> I am testing attached volume storage on our openstack cluster which uses
> ceph for block storage.
> our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm
> thinking some parameter is a little off because with relatively small
> writes I am seeing drastically reduced write speeds.
> 
Large journals are a waste for most people, especially when your backing
storage are HDDs.

> 
> we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.
> 
I hope that's not your plan for production, with a replica of 2 you're
looking at pretty much guaranteed data loss over time, unless your OSDs
are actually RAIDs.

5GB journals tend to be overkill already.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008606.html

If you were to actually look at your OSD nodes during those tests with
something like atop or "iostat -x", you'd likely see that with prolonged
writes you wind up with the speed of what your HDDs can do, i.e. see them
(all or individually) being quite busy.

Lastly, for nearly everybody in real life situations the
bandwidth/throughput becomes a distant second to latency considerations. 

Christian

> 
>  here is our Ceph config
> 
> [global]
> fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d
> mon_initial_members = node-5 node-4 node-3
> mon_host = 192.168.0.8 192.168.0.7 192.168.0.13
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 1
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 192.168.0.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 50000
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 192.168.1.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> 
> [client]
> rbd_cache = True
> rbd_cache_writethrough_until_flush = True
> 
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw_socket_path = /tmp/radosgw.sock
> rgw_keystone_revocation_interval = 1000000
> rgw_keystone_url = 192.168.0.2:35357
> rgw_keystone_admin_token = ZBz37Vlv
> host = node-3
> rgw_dns_name = *.ciminc.com
> rgw_print_continue = True
> rgw_keystone_token_cache_size = 10
> rgw_data = /var/lib/ceph/radosgw
> user = www-data
> 
> This is the degradation I am speaking of..
> 
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f
> /mnt/ext4/output;
> 1024+0 records in
> 1024+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f
> /mnt/ext4/output;
> 2048+0 records in
> 2048+0 records out
> 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s
> 
>  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f
> /mnt/ext4/output;
> 3072+0 records in
> 3072+0 records out
> 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s
> 
> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f
> /mnt/ext4/output;
> 5120+0 records in
> 5120+0 records out
> 5242880000 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s
> 
> Any suggestions for improving the large write degradation?

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com