Hello, On Wed, 17 Feb 2016 21:47:31 +0530 Swapnil Jain wrote: > Thanks Christian, > > > > > On 17-Feb-2016, at 7:25 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > > > Hello, > > > > On Mon, 15 Feb 2016 21:10:33 +0530 Swapnil Jain wrote: > > > >> For most of you CEPH on ARMv7 might not sound good. This is our setup > >> and our FIO testing report. I am not able to understand …. > >> > > Just one OSD per Microserver as in your case should be fine. > > As always, use atop (or similar) on your storage servers when running > > these tests to see where your bottlenecks are (HDD/network/CPU). > > > >> 1) Are these results good or bad > >> 2) Write is much better than read, where as read should be better. > >> > > Your testing is flawed, more below. > > > >> Hardware: > >> > >> 8 x ARMv7 MicroServer with 4 x 10G Uplink > >> > >> Each MicroServer with: > >> 2GB RAM > > Barely OK for one OSD, not enough if you run MONs as well on it (as you > > do). > > > >> Dual Core 1.6 GHz processor > >> 2 x 2.5 Gbps Ethernet (1 for Public / 1 for Cluster Network) > >> 1 x 3TB SATA HDD > >> 1 x 128GB MSata Flash > > Exact model/maker please. > > Its Seagate ST3000NC000 & Phison Msata > There's quite a large number of Phison MSata drive models available it seems. And their specifications don't mention endurance, DWPD or TBW... Anyway, you will want to look into this to see if they are a good match for Ceph journals: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > > >> > >> Software: > >> Debian 8.3 32bit > >> ceph version 9.2.0-25-gf480cea > >> > >> Setup: > >> > >> 3 MON (Shared with 3 OSD) > >> 8 OSD > >> Data on 3TB SATA with XFS > >> Journal on 128GB MSata Flash > >> > >> pool with replica 1 > > Not a very realistic test of course. > > For a production, fault resilient cluster you would have to divide your > > results by 3 (at least). > > > >> 500GB image with 4M object size > >> > >> FIO command: fio --name=unit1 --filename=/dev/rbd1 --bs=4k > >> --runtime=300 --readwrite=write > >> > > > > If that is your base FIO command line, I'm assuming you mounted that > > image on the client via the kernel RBD module? > > Yes, its via kernel RBD module > > > > > > Either way, the main reason you're seeing writes being faster than > > reads is that with this command line (no direct=1 flag) fio will use > > the page cache on your client host for writes, speeding things up > > dramatically. To get a realistic idea of your clusters ability, use > > direct=1 and also look into rados bench. > > > > Another reason for the slow reads is that Ceph (RBD) does badly with > > regards to read-ahead, setting /sys/block/rdb1/queue/read_ahead_kb to > > something like 2048 should improve things. > > > > That all being said, your read values look awfully low. > > Thanks again for the suggestion. Below are some results using rados > bench, here read looks much better than write. Still is it good or can > be better? rados bench with default setting operates on 4MB blocks, which matches Ceph objects. Meaning it is optimized for giving the best performance figures in terms of throughput. In real live situations you're likely to be more interested in IOPS than MB/s. If you run it with "-b 4096" (aka 4KB blocks) you're likely to see with atop that your CPUs are getting much MUCH more of a workout. > I also checked atop, couldn't see any bottleneck except that > that sda disk was busy 80-90% of time during the test. > Well, if that is true (on average) for all your nodes, then you found the bottleneck. Also, which one is "sda", the HDD or the SSD? > > WRITE Throughput (MB/sec): 297.544 > WRITE Average Latency: 0.21499 > > READ Throughput (MB/sec): 478.026 > READ Average Latency: 0.133818 > These are pretty good numbers for this setup indeed. But again, with a replication size of 1 they're not representative of reality at all. Regards, Christian > — > Swapnil > > > > > Christian > >> Client: > >> > >> Ubuntu on Intel 24core/16GB RAM 10G Ethernet > >> > >> Result for different tests > >> > >> 128k-randread.txt: read : io=2587.4MB, bw=8830.2KB/s, iops=68, > >> runt=300020msec 128k-randwrite.txt: write: io=48549MB, bw=165709KB/s, > >> iops=1294, runt=300005msec 128k-read.txt: read : io=26484MB, > >> bw=90397KB/s, iops=706, runt=300002msec 128k-write.txt: write: > >> io=89538MB, bw=305618KB/s, iops=2387, runt=300004msec > >> 16k-randread.txt: read : io=383760KB, bw=1279.2KB/s, iops=79, > >> runt=300001msec 16k-randwrite.txt: write: io=8720.7MB, bw=29764KB/s, > >> iops=1860, runt=300002msec 16k-read.txt: read : io=27444MB, > >> bw=93676KB/s, iops=5854, runt=300001msec 16k-write.txt: write: > >> io=87811MB, bw=299726KB/s, iops=18732, runt=300001msec > >> 1M-randread.txt: read : io=10439MB, bw=35631KB/s, iops=34, > >> runt=300008msec 1M-randwrite.txt: write: io=98943MB, bw=337721KB/s, > >> iops=329, runt=300004msec 1M-read.txt: read : io=25717MB, > >> bw=87779KB/s, iops=85, runt=300007msec 1M-write.txt: write: > >> io=74264MB, bw=253487KB/s, iops=247, runt=300001msec > >> 4k-randread.txt: read : io=116920KB, bw=399084B/s, iops=97, > >> runt=300002msec 4k-randwrite.txt: write: io=5579.2MB, bw=19043KB/s, > >> iops=4760, runt=300004msec 4k-read.txt: read : io=27032MB, > >> bw=92271KB/s, iops=23067, runt=300001msec 4k-write.txt: write: > >> io=92955MB, bw=317284KB/s, iops=79320, runt=300001msec > >> 64k-randread.txt: read : io=1400.2MB, bw=4778.2KB/s, iops=74, > >> runt=300020msec 64k-randwrite.txt: write: io=27676MB, bw=94467KB/s, > >> iops=1476, runt=300005msec 64k-read.txt: read : io=27805MB, > >> bw=94909KB/s, iops=1482, runt=300002msec 64k-write.txt: write: > >> io=95484MB, bw=325917KB/s, iops=5092, runt=300003msec > >> > >> > >> — > >> Swapnil Jain | Swapnil@xxxxxxxxx <mailto:Swapnil@xxxxxxxxx> > >> <mailto:Swapnil@xxxxxxxxx <mailto:Swapnil@xxxxxxxxx>> Solution > >> Architect & Red Hat Certified Instructor RHC{A,DS,E,I,SA,SA-RHOS,VA}, > >> CE{H,I}, CC{DA,NA}, MCSE, CNE > >> > >> > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx <mailto:chibi@xxxxxxx> Global OnLine > > Japan/Rakuten Communications http://www.gol.com/ <http://www.gol.com/> -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com