CEPH I/O Performance with OpenStack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm using ceph-0.80.7 with Mirantis OpenStack IceHouse - RBD for nova ephemeral disk and glance.

I have two ceph nodes with the following specifications 
2x CEPH - OSD - 2 Replication factor 
Model : SuperMicro X8DT3 
CPU : Dual intel E5620 
RAM : 32G 
HDD : 2x 480GB SSD RAID-1 ( OS and Journal ) 
      22x 4TB SATA RAID-10 ( OSD )

3x Controllers - CEPH Monitor
Model : ProLiant DL180 G6
CPU : Dual intel E5620
RAM : 24G

*Network 
Public : 1G NIC ( eth0 ) - Juniper 2200-48 
Storage,Admin,Management - 10G NIC ( eth1 ) - Arista 7050T-36 (32x 10GE UTP, 4x 10GE SFP+)

*I'm getting very poor ceph performance and high I/O with write/read
And when light or deep scrub is running the load on VM's went crazy. 
ceph.conf tuning didn't help.

[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = xx.xx.xx.xx xx.xx.xx.xx xx.xx.xx.xx
mon_initial_members = node-xx node-xx node-xx
fsid = 
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 50
public_network = xx.xx.xx.xx
osd_journal_size = 100000
auth_supported = cephx
osd_pool_default_pgp_num = 50
osd_pool_default_flag_hashpspool = true
osd_mkfs_type = xfs
cluster_network = xx.xx.xx.xx
mon_clock_drift_allowed = 2

[osd]

osd_op_threads=16
osd_disk_threads=4
osd_disk_thread_ioprio_priority=7
osd_disk_thread_ioprio_class=idle

filestore op threads=8
filestore_queue_max_ops=100000
filestore_queue_committing_max_ops=100000
filestore_queue_max_bytes=1073741824
filestore_queue_committing_max_bytes=1073741824
filestore_max_sync_interval=10
filestore_fd_cache_size=20240
filestore_flusher=false
filestore_flush_min=0
filestore_sync_flush=true

journal_dio=true
journal_aio=true
journal_max_write_bytes=1073741824
journal_max_write_entries=50000
journal_queue_max_bytes=1073741824
journal_queue_max_ops=100000

ms_dispatch_throttle_bytes=1073741824
objecter_infilght_op_bytes=1073741824
objecter_inflight_ops=1638400

osd_recovery_threads = 16
#osd_recovery_max_active = 2
#osd_recovery_max_chunk = 8388608
#osd_recovery_op_priority = 2
#osd_max_backfills = 1

[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_cache_size = 20 GiB
rbd_cache_max_dirty = 16 GiB
rbd_cache_target_dirty = 512 MiB


*Results inside CentOS6 64bit VM : 
[root@vm ~]# dd if=/dev/zero of=./largefile bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 17.3417 s, 61.9 MB/s

[root@vm ~]# rm -rf /tmp/test && spew -i 50 -v -d --write -r -b 4096 10M /tmp/test
Iteration:        1    Total runtime: 00:00:00
WTR:    27753.91 KiB/s   Transfer time: 00:00:00    IOPS:     6938.48

Iteration:        2    Total runtime: 00:00:00
WTR:    29649.53 KiB/s   Transfer time: 00:00:00    IOPS:     7412.38

Iteration:        3    Total runtime: 00:00:01
WTR:    30897.44 KiB/s   Transfer time: 00:00:00    IOPS:     7724.36

Iteration:        4    Total runtime: 00:00:02
WTR:     7474.93 KiB/s   Transfer time: 00:00:01    IOPS:     1868.73

Iteration:        5    Total runtime: 00:00:02
WTR:    24810.11 KiB/s   Transfer time: 00:00:00    IOPS:     6202.53

Iteration:        6    Total runtime: 00:00:03
WTR:    28534.01 KiB/s   Transfer time: 00:00:00    IOPS:     7133.50

Iteration:        7    Total runtime: 00:00:03
WTR:    27687.95 KiB/s   Transfer time: 00:00:00    IOPS:     6921.99

Iteration:        8    Total runtime: 00:00:03
WTR:    29195.91 KiB/s   Transfer time: 00:00:00    IOPS:     7298.98

Iteration:        9    Total runtime: 00:00:04
WTR:    28315.53 KiB/s   Transfer time: 00:00:00    IOPS:     7078.88

Iteration:       10    Total runtime: 00:00:04
WTR:    27971.42 KiB/s   Transfer time: 00:00:00    IOPS:     6992.85

Iteration:       11    Total runtime: 00:00:04
WTR:    29873.39 KiB/s   Transfer time: 00:00:00    IOPS:     7468.35

Iteration:       12    Total runtime: 00:00:05
WTR:    32364.30 KiB/s   Transfer time: 00:00:00    IOPS:     8091.08

Iteration:       13    Total runtime: 00:00:05
WTR:    32619.98 KiB/s   Transfer time: 00:00:00    IOPS:     8155.00

Iteration:       14    Total runtime: 00:00:06
WTR:    18714.54 KiB/s   Transfer time: 00:00:00    IOPS:     4678.64

Iteration:       15    Total runtime: 00:00:06
WTR:    17070.37 KiB/s   Transfer time: 00:00:00    IOPS:     4267.59

Iteration:       16    Total runtime: 00:00:07
WTR:    22403.23 KiB/s   Transfer time: 00:00:00    IOPS:     5600.81

Iteration:       17    Total runtime: 00:00:07
WTR:    16076.39 KiB/s   Transfer time: 00:00:00    IOPS:     4019.10

Iteration:       18    Total runtime: 00:00:08
WTR:    26219.77 KiB/s   Transfer time: 00:00:00    IOPS:     6554.94

Iteration:       19    Total runtime: 00:00:08
WTR:    29054.01 KiB/s   Transfer time: 00:00:00    IOPS:     7263.50

Iteration:       20    Total runtime: 00:00:08
WTR:    27210.02 KiB/s   Transfer time: 00:00:00    IOPS:     6802.50

Iteration:       21    Total runtime: 00:00:09
WTR:    28502.72 KiB/s   Transfer time: 00:00:00    IOPS:     7125.68

Iteration:       22    Total runtime: 00:00:10
WTR:    11172.32 KiB/s   Transfer time: 00:00:00    IOPS:     2793.08

Iteration:       23    Total runtime: 00:00:10
WTR:    29038.44 KiB/s   Transfer time: 00:00:00    IOPS:     7259.61

Iteration:       24    Total runtime: 00:00:11
WTR:    25374.86 KiB/s   Transfer time: 00:00:00    IOPS:     6343.72

Iteration:       25    Total runtime: 00:00:11
WTR:    19123.03 KiB/s   Transfer time: 00:00:00    IOPS:     4780.76

Iteration:       26    Total runtime: 00:00:11
WTR:    27481.82 KiB/s   Transfer time: 00:00:00    IOPS:     6870.45

Iteration:       27    Total runtime: 00:00:12
WTR:    11416.62 KiB/s   Transfer time: 00:00:00    IOPS:     2854.15

Iteration:       28    Total runtime: 00:00:13
WTR:    33922.34 KiB/s   Transfer time: 00:00:00    IOPS:     8480.58

Iteration:       29    Total runtime: 00:00:13
WTR:    26893.30 KiB/s   Transfer time: 00:00:00    IOPS:     6723.32

Iteration:       30    Total runtime: 00:00:13
WTR:    27222.82 KiB/s   Transfer time: 00:00:00    IOPS:     6805.71

Iteration:       31    Total runtime: 00:00:14
WTR:    19842.92 KiB/s   Transfer time: 00:00:00    IOPS:     4960.73

Iteration:       32    Total runtime: 00:00:14
WTR:    27585.91 KiB/s   Transfer time: 00:00:00    IOPS:     6896.48

Iteration:       33    Total runtime: 00:00:15
WTR:    31579.30 KiB/s   Transfer time: 00:00:00    IOPS:     7894.83

Iteration:       34    Total runtime: 00:00:15
WTR:    26563.32 KiB/s   Transfer time: 00:00:00    IOPS:     6640.83

Iteration:       35    Total runtime: 00:00:15
WTR:    24829.90 KiB/s   Transfer time: 00:00:00    IOPS:     6207.48

Iteration:       36    Total runtime: 00:00:16
WTR:    26769.70 KiB/s   Transfer time: 00:00:00    IOPS:     6692.43

Iteration:       37    Total runtime: 00:00:16
WTR:    21256.06 KiB/s   Transfer time: 00:00:00    IOPS:     5314.01

Iteration:       38    Total runtime: 00:00:17
WTR:    14035.99 KiB/s   Transfer time: 00:00:00    IOPS:     3509.00

Iteration:       39    Total runtime: 00:00:17
WTR:    31576.48 KiB/s   Transfer time: 00:00:00    IOPS:     7894.12

Iteration:       40    Total runtime: 00:00:18
WTR:    27915.22 KiB/s   Transfer time: 00:00:00    IOPS:     6978.80

Iteration:       41    Total runtime: 00:00:18
WTR:    33392.14 KiB/s   Transfer time: 00:00:00    IOPS:     8348.03

Iteration:       42    Total runtime: 00:00:18
WTR:    27876.61 KiB/s   Transfer time: 00:00:00    IOPS:     6969.15

Iteration:       43    Total runtime: 00:00:19
WTR:    28092.05 KiB/s   Transfer time: 00:00:00    IOPS:     7023.01

Iteration:       44    Total runtime: 00:00:19
WTR:    29125.74 KiB/s   Transfer time: 00:00:00    IOPS:     7281.44

Iteration:       45    Total runtime: 00:00:19
WTR:    26937.87 KiB/s   Transfer time: 00:00:00    IOPS:     6734.47

Iteration:       46    Total runtime: 00:00:20
WTR:    23235.92 KiB/s   Transfer time: 00:00:00    IOPS:     5808.98

Iteration:       47    Total runtime: 00:00:20
WTR:    27946.07 KiB/s   Transfer time: 00:00:00    IOPS:     6986.52

Iteration:       48    Total runtime: 00:00:21
WTR:    17759.06 KiB/s   Transfer time: 00:00:00    IOPS:     4439.77

Iteration:       49    Total runtime: 00:00:23
WTR:     4779.38 KiB/s   Transfer time: 00:00:02    IOPS:     1194.84

Iteration:       50    Total runtime: 00:00:23
WTR:    27997.65 KiB/s   Transfer time: 00:00:00    IOPS:     6999.41

Total iterations:                               50
Total runtime:                            00:00:23
Total write transfer time (WTT):          00:00:23
Total write transfer rate (WTR):    21493.23 KiB/s
Total write IOPS:                    5373.31 IOPS


I do not know if the hardware is impacting the performance that's why i need your advice may be some tuning could help.

If it's a hardware issue please help finding out an answer for the following 5 questions.

1. Is it better to have small number of OSD nodes with many hardisks such SuperMicro SC846TQ or it's better to get high number of OSD nodes with small number of HDD's such HP DL380pG8 ?

I need around 20TB storage, SuperMicro SC846TQ can get 24 hardisk. 
I may attach 24x 960G SSD - NO Raid - with 3x SuperMicro servers - replication factor 3.

Or it's better to scale-out and put smaller disks on many servers such ( HP DL380pG8/2x Intel Xeon E5-2650 ) which can hold 12 hardisk
And Attach 12x 960G SSD - NO Raid - 6x OSD nodes - replication factor 3.

2. I'm using Mirantis/Fuel 5 for provisioning and deployment of nodes 
When i attach the new ceph osd nodes to the environment, Will the data be replicated automatically from my current old SuperMicro OSD nodes to the new servers after the deployment complete ?

3. I will use 2x 960G SSD RAID 1 for OS 
Is it recommended put the SSD journal disk as a separate partition on the same disk of OS ?

4. Is it safe to remove the OLD ceph nodes while i'm currently using 2 replication factors after adding the new hardware nodes ?

5. Do i need RAID 1 for the journal hardisk ? and if not, What will happen if one of the journal HDD's failed ?

6. Should i use RAID Level for the drivers on OSD nodes ? or it's better to go without RAID ?

Your advice is highly appreciated.

Best Regards,
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux