Thanks Christian,
On 17-Feb-2016, at 7:25 AM, Christian Balzer < chibi@xxxxxxx> wrote:
Hello,On Mon, 15 Feb 2016 21:10:33 +0530 Swapnil Jain wrote:For most of you CEPH on ARMv7 might not sound good. This is our setup and our FIO testing report. I am not able to understand ….
Just one OSD per Microserver as in your case should be fine.As always, use atop (or similar) on your storage servers when runningthese tests to see where your bottlenecks are (HDD/network/CPU).1) Are these results good or bad 2) Write is much better than read, where as read should be better.
Your testing is flawed, more below.Hardware:
8 x ARMv7 MicroServer with 4 x 10G Uplink
Each MicroServer with: 2GB RAM
Barely OK for one OSD, not enough if you run MONs as well on it (as youdo).Dual Core 1.6 GHz processor 2 x 2.5 Gbps Ethernet (1 for Public / 1 for Cluster Network) 1 x 3TB SATA HDD 1 x 128GB MSata Flash
Exact model/maker please.
Its Seagate ST3000NC000 & Phison Msata
Software: Debian 8.3 32bit ceph version 9.2.0-25-gf480cea
Setup:
3 MON (Shared with 3 OSD) 8 OSD Data on 3TB SATA with XFS Journal on 128GB MSata Flash
pool with replica 1
Not a very realistic test of course.For a production, fault resilient cluster you would have to divide yourresults by 3 (at least).500GB image with 4M object size
FIO command: fio --name=unit1 --filename=/dev/rbd1 --bs=4k --runtime=300 --readwrite=write
If that is your base FIO command line, I'm assuming you mounted that imageon the client via the kernel RBD module?
Yes, its via kernel RBD module
Either way, the main reason you're seeing writes being faster than reads is that with this command line (no direct=1 flag) fio will use the page cache on your client host for writes, speeding things up dramatically. To get a realistic idea of your clusters ability, use direct=1 and also look into rados bench.
Another reason for the slow reads is that Ceph (RBD) does badly with regards to read-ahead, setting /sys/block/rdb1/queue/read_ahead_kb to something like 2048 should improve things.
That all being said, your read values look awfully low.
Thanks again for the suggestion. Below are some results using rados bench, here read looks much better than write. Still is it good or can be better? I also checked atop, couldn't see any bottleneck except that that sda disk was busy 80-90% of time during the test.
WRITE Throughput (MB/sec): 297.544 WRITE Average Latency: 0.21499
READ Throughput (MB/sec): 478.026 READ Average Latency: 0.133818
— Swapnil ChristianClient:
Ubuntu on Intel 24core/16GB RAM 10G Ethernet
Result for different tests
128k-randread.txt: read : io=2587.4MB, bw=8830.2KB/s, iops=68, runt=300020msec 128k-randwrite.txt: write: io=48549MB, bw=165709KB/s, iops=1294, runt=300005msec 128k-read.txt: read : io=26484MB, bw=90397KB/s, iops=706, runt=300002msec 128k-write.txt: write: io=89538MB, bw=305618KB/s, iops=2387, runt=300004msec 16k-randread.txt: read : io=383760KB, bw=1279.2KB/s, iops=79, runt=300001msec 16k-randwrite.txt: write: io=8720.7MB, bw=29764KB/s, iops=1860, runt=300002msec 16k-read.txt: read : io=27444MB, bw=93676KB/s, iops=5854, runt=300001msec 16k-write.txt: write: io=87811MB, bw=299726KB/s, iops=18732, runt=300001msec 1M-randread.txt: read : io=10439MB, bw=35631KB/s, iops=34, runt=300008msec 1M-randwrite.txt: write: io=98943MB, bw=337721KB/s, iops=329, runt=300004msec 1M-read.txt: read : io=25717MB, bw=87779KB/s, iops=85, runt=300007msec 1M-write.txt: write: io=74264MB, bw=253487KB/s, iops=247, runt=300001msec 4k-randread.txt: read : io=116920KB, bw=399084B/s, iops=97, runt=300002msec 4k-randwrite.txt: write: io=5579.2MB, bw=19043KB/s, iops=4760, runt=300004msec 4k-read.txt: read : io=27032MB, bw=92271KB/s, iops=23067, runt=300001msec 4k-write.txt: write: io=92955MB, bw=317284KB/s, iops=79320, runt=300001msec 64k-randread.txt: read : io=1400.2MB, bw=4778.2KB/s, iops=74, runt=300020msec 64k-randwrite.txt: write: io=27676MB, bw=94467KB/s, iops=1476, runt=300005msec 64k-read.txt: read : io=27805MB, bw=94909KB/s, iops=1482, runt=300002msec 64k-write.txt: write: io=95484MB, bw=325917KB/s, iops=5092, runt=300003msec
— Swapnil Jain | Swapnil@xxxxxxxxx <mailto:Swapnil@xxxxxxxxx> Solution Architect & Red Hat Certified Instructor RHC{A,DS,E,I,SA,SA-RHOS,VA}, CE{H,I}, CC{DA,NA}, MCSE, CNE
-- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communicationshttp://www.gol.com/
|