Unfortunately, even after removing all my kernel configuration , the performance did not improve
Currently
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0 ipv6.disable=1 "
Before
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet net.ifnames=0 biosdevname=0 ipv6.disable=1 intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll numa=off"
This is extremely puzzling - any ideas, suggestions for troubleshooting it will be GREATLY appreciated
Steven
On 2 February 2018 at 10:51, Steven Vacaroaia <stef97@xxxxxxxxx> wrote:
Hi Mark,ThanksMy pools are using replication =2I'll re enable numa and report backStevenOn 2 February 2018 at 10:48, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
Not sure if this info is of any help, please beware I am also just in a
testing phase with ceph.
I don’t know how numa=off is interpreted by the os. If it just hides
the numa, you still could run into the 'known issues'. That is why I
have numad running.
Furthermore I have put an osd 'out' that gives also a 0 in the reweight
column. So I guess your osd.1 is also not participating? If so, could
not be nice if your are testing 3x replication with 2 disks?
I have got this on SATA 5400rpm disks, replicated pool size 3.
rados bench -p rbd 30 write --id rbd
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg
lat(s)
20 16 832 816 163.178 180 0.157838
0.387074
21 16 867 851 162.073 140 0.157289
0.38817
22 16 900 884 160.705 132 0.224024
0.393674
23 16 953 937 162.934 212 0.530274
0.388189
24 16 989 973 162.144 144 0.209806
0.389644
25 16 1028 1012 161.898 156 0.118438
0.391057
26 16 1067 1051 161.67 156 0.248463
0.38977
27 16 1112 1096 162.348 180 0.754184
0.392159
28 16 1143 1127 160.977 124 0.439342
0.393641
29 16 1185 1169 161.219 168 0.0801006
0.393004
30 16 1221 1205 160.644 144 0.224278
0.39363
Total time run: 30.339270
Total writes made: 1222
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 161.111
Stddev Bandwidth: 24.6819
Max bandwidth (MB/sec): 212
Min bandwidth (MB/sec): 120
Average IOPS: 40
Stddev IOPS: 6
Max IOPS: 53
Min IOPS: 30
Average Latency(s): 0.396239
Stddev Latency(s): 0.249998
Max latency(s): 1.29482
Min latency(s): 0.06875
______________________________
-----Original Message-----
From: Steven Vacaroaia [mailto:stef97@xxxxxxxxx]
Sent: vrijdag 2 februari 2018 15:25
To: ceph-users
Subject: ceph luminous performance - disks at 100% , low
network utilization
Hi,
I have been struggling to get my test cluster to behave ( from a
performance perspective)
Dell R620, 64 GB RAM, 1 CPU, numa=off , PERC H710, Raid0, Enterprise 10K
disks
No SSD - just plain HDD
Local tests ( dd, hdparm ) confirm my disks are capable of delivering
200 MBs
Fio with 15 jobs indicate 100 MBs
Ceph tell shows 400MBs
rados bench with 1 thread provide 3 MB
rados bench with 32 threads, 2 OSDs ( one per server) , barely touch 10
MB
Adding a third server / OSD improve performance slightly ( 11 MB)
atop shows disk usage at 100% for extended period of time
Network usage is very low
Nothing else is "red"
I have removed all TCP setting and left ceph.conf mostly with defaults
What am I missing ?
Many thanks
Steven
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
0 hdd 0.54529 osd.0 up 1.00000 1.00000
-5 0.54529 host osd02
1 hdd 0.54529 osd.1 up 0 1.00000
-7 0 host osd04
-17 0.54529 host osd05
2 hdd 0.54529 osd.2 up 1.00000 1.00000
[root@osd01 ~]# ceph tell osd.0 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_sec": 452125657
}
[root@osd01 ~]# ceph tell osd.2 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_sec": 340553488
}
hdparm -tT /dev/sdc
/dev/sdc:
Timing cached reads: 5874 MB in 1.99 seconds = 2948.51 MB/sec
Timing buffered disk reads: 596 MB in 3.01 seconds = 198.17 MB/sec
fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=15 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
...
fio-2.2.8
Starting 15 processes
Jobs: 15 (f=15): [W(15)] [100.0% done] [0KB/104.9MB/0KB /s] [0/26.9K/0
iops] [eta 00m:00s]
fio --filename=/dev/sdc --direct=1 --sync=1 --rw=write --bs=4k
--numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
iodepth=1
...
fio-2.2.8
Starting 5 processes
Jobs: 5 (f=5): [W(5)] [100.0% done] [0KB/83004KB/0KB /s] [0/20.8K/0
iops] [eta 00m:00s]
_________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com