Re: speed decrease with size

Ben Erridge <ben@xxxxxxxxxx> · Mon, 13 Mar 2017 11:25:15 -0400

On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer <chibi@xxxxxxx> wrote:

Hello,

On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote:

> I am testing attached volume storage on our openstack cluster which uses

> ceph for block storage.

> our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm

> thinking some parameter is a little off because with relatively small

> writes I am seeing drastically reduced write speeds.

>

Large journals are a waste for most people, especially when your backing

storage are HDDs.

>

> we have 2 nodes withs 12 total OSD's each with 50GB SSD Journal.

>

I hope that's not your plan for production, with a replica of 2 you're

looking at pretty much guaranteed data loss over time, unless your OSDs

are actually RAIDs.

I am aware that replica of 3 is suggested thanks. 

5GB journals tend to be overkill already.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008606.html

If you were to actually look at your OSD nodes during those tests with

something like atop or "iostat -x", you'd likely see that with prolonged

writes you wind up with the speed of what your HDDs can do, i.e. see them

(all or individually) being quite busy.

That is what I was thinking as well which is not what I want. I want to better utilize these large SSD journals. If I have 50GB journal
and I only want to write 5GB of data I should be able to get near SSD speed for this operation. Why am I not? Maybe I should increase filestore_max_sync_interval.

Lastly, for nearly everybody in real life situations the

bandwidth/throughput becomes a distant second to latency considerations.

Thanks for the advice however.

Christian

>

>  here is our Ceph config

>

> [global]

> fsid = 19bc15fd-c0cc-4f35-acd2-292a86fbcf7d

> mon_initial_members = node-5 node-4 node-3

> mon_host = 192.168.0.8 192.168.0.7 192.168.0.13

> auth_cluster_required = cephx

> auth_service_required = cephx

> auth_client_required = cephx

> filestore_xattr_use_omap = true

> log_to_syslog_level = info

> log_to_syslog = True

> osd_pool_default_size = 1

> osd_pool_default_min_size = 1

> osd_pool_default_pg_num = 64

> public_network = 192.168.0.0/24

> log_to_syslog_facility = LOG_LOCAL0

> osd_journal_size = 50000

> auth_supported = cephx

> osd_pool_default_pgp_num = 64

> osd_mkfs_type = xfs

> cluster_network = 192.168.1.0/24

> osd_recovery_max_active = 1

> osd_max_backfills = 1

>

> [client]

> rbd_cache = True

> rbd_cache_writethrough_until_flush = True

>

> [client.radosgw.gateway]

> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator

> keyring = /etc/ceph/keyring.radosgw.gateway

> rgw_socket_path = /tmp/radosgw.sock

> rgw_keystone_revocation_interval = 1000000

> rgw_keystone_url = 192.168.0.2:35357

> rgw_keystone_admin_token = ZBz37Vlv

> host = node-3

> rgw_dns_name = *.ciminc.com

> rgw_print_continue = True

> rgw_keystone_token_cache_size = 10

> rgw_data = /var/lib/ceph/radosgw

> user = www-data

>

> This is the degradation I am speaking of..

>

>

> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=1k; rm -f

> /mnt/ext4/output;

> 1024+0 records in

> 1024+0 records out

> 1048576000 bytes (1.0 GB) copied, 0.887431 s, 1.2 GB/s

>

> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=2k; rm -f

> /mnt/ext4/output;

> 2048+0 records in

> 2048+0 records out

> 2097152000 bytes (2.1 GB) copied, 3.75782 s, 558 MB/s

>

>  dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=3k; rm -f

> /mnt/ext4/output;

> 3072+0 records in

> 3072+0 records out

> 3145728000 bytes (3.1 GB) copied, 10.0054 s, 314 MB/s

>

> dd if=/dev/zero of=/mnt/ext4/output bs=1000k count=5k; rm -f

> /mnt/ext4/output;

> 5120+0 records in

> 5120+0 records out

> 5242880000 bytes (5.2 GB) copied, 24.1971 s, 217 MB/s

>

> Any suggestions for improving the large write degradation?

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications

http://www.gol.com/

-- 
-------------.
Ben ErridgeCenter For Information Management, Inc.
(734) 930-0855
3550 West Liberty Road Ste 1
Ann Arbor, MI 48103

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com