-----Original Message----- From: Gregory Farnum [mailto:greg@xxxxxxxxxxx] Sent: Saturday, February 23, 2013 1:57 AM To: Kelvin Huang/WYHQ/Wiwynn Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: Re: Ceph scalar & replicas performance On Thu, Feb 21, 2013 at 5:01 PM, <Kelvin_Huang@xxxxxxxxxx> wrote: > Hi all, > I have some problem after my scalar performance test !! > > Setup: > Linux kernel: 3.2.0 > OS: Ubuntu 12.04 > Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 10GbE NIC + RAID card: LSI MegaRAID SAS 9260-4i > For every HDD: RAID0, Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct Storage server number : 1 to 4 > > Ceph version : 0.48.2 > Replicas : 2 > > FIO cmd: > [Sequencial Read] > fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = read --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 > > [Sequencial Read] > fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = write --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 > > [Random Read] > fio --iodepth = 32 --numjobs=8 --runtime=120 --bs = 65536 --rw = randread --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 > > [Random Write] > fio --iodepth = 32 --numjobs=8 --runtime=120 --bs = 65536 --rw = randwrite --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 > > Use ceph client then create 1T RBD image for testing, the client also has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04 > > Performance result: > Bandwidth (MB/sec) > ┌──────────────────────────────────────── > │storage server number│Sequential Read │Sequential Write│Random Read│Random Write │ > ├───────── ┼────────────────────────────── > │ 1 │ 259 │ 76 │ 837 │ 26 │ > ├───────── ┼────────────────────────────── > │ 2 │ 349 │ 121 │ 950 │ 45 │ > ├───────── ┼────────────────────────────── > │ 3 │ 354 │ 108 │ 490 │ 71 │ > ├───────── ┼────────────────────────────── > │ 4 │ 338 │ 103 │ 610 │ 89 │ > ├───────── ┼────────────────────────────── > > We expect that bandwidth will increase when storage server increase under all case, but the result is not !! > Can you share your idea for read/write bandwidth when storage server increasing ? > There's a bunch of stuff that could be weird here. Is your switch > capable of handling all the traffic going over it? Have you > benchmarked the drives and filesystems on each node individually to > make sure they all have the same behavior, or are some of your > additions slower than the others? (My money is on you having some slow > drives that are dragging everything down.) Okay, I will go to re-check the setting of each storage server, but I still interested want to know the correct trend (Seq R/W and Random R/W) when storage server increasing, or you have similar experiment can share the result? Thanks !! > In another case, we fixed use 4 storage servers then adjust the number of replicas 2 to 4 > > Performance result: > > Bandwidth (MB/sec) > ┌──────────────────────────────────────── > │ replicas number │Sequential Read │Sequential Write│Random Read│Random Write │ > ├───────── ┼────────────────────────────── > │ 2 │ 338 │ 103 │ 614 │ 89 │ > ├───────── ┼────────────────────────────── > │ 3 │ 337 │ 76 │ 791 │ 62 │ > ├───────── ┼────────────────────────────── > │ 4 │ 337 │ 60 │ 754 │ 43 │ > ├───────── ┼────────────────────────────── > > The bandwidth of write will decrease when replicas increase that is easy to know, but why read bandwidth did not increase? > Reads are always served from the "primary" OSD, but even if they > weren't, you distribute the same number of reads over the same number > of disks no matter how many replicas you have of each individual data > block... > But in particular the change in random read values that you're seeing > indicates that your data is very noisy — I'm not sure I'd trust any of > the values you're seeing, especially the weirder trends. It might be > all noise and no real data value. > -Greg ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f