RE: Ceph scalar & replicas performance

<Kelvin_Huang@xxxxxxxxxx> · Sun, 17 Mar 2013 04:20:02 +0000

Hi, all

I think you should missing this mail, so I re-sent the mail.

Thanks!!

-----Original Message-----
From: Kelvin Huang/WYHQ/Wiwynn 
Sent: Monday, February 25, 2013 9:58 AM
To: 'ceph-devel@xxxxxxxxxxxxxxx'
Cc: Eric YH Chen/WYHQ/Wiwynn
Subject: RE: Ceph scalar & replicas performance

-----Original Message-----
From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
Sent: Saturday, February 23, 2013 1:57 AM
To: Kelvin Huang/WYHQ/Wiwynn
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: Ceph scalar & replicas performance

On Thu, Feb 21, 2013 at 5:01 PM,  <Kelvin_Huang@xxxxxxxxxx> wrote:
> Hi all,
> I have some problem after my scalar performance test !!
>
> Setup:
> Linux kernel: 3.2.0
> OS: Ubuntu 12.04
> Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 10GbE NIC + RAID card: LSI MegaRAID SAS 9260-4i
>          For every HDD: RAID0, Write Policy: Write Back with BBU, Read 
> Policy: ReadAhead, IO Policy: Direct Storage server number : 1 to 4
>
> Ceph version : 0.48.2
> Replicas : 2
>
> FIO cmd:
> [Sequencial Read]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = read 
> --ioengine=libaio --group_reporting --direct=1 --eta=always 
> --ramp_time=10 --thinktime=10
>
> [Sequencial Read]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = 
> write --ioengine=libaio --group_reporting --direct=1 --eta=always 
> --ramp_time=10 --thinktime=10
>
> [Random Read]
> fio --iodepth = 32 --numjobs=8 --runtime=120  --bs = 65536 --rw = 
> randread --ioengine=libaio --group_reporting --direct=1 --eta=always 
> --ramp_time=10 --thinktime=10
>
> [Random Write]
> fio --iodepth = 32 --numjobs=8 --runtime=120  --bs = 65536 --rw = 
> randwrite --ioengine=libaio --group_reporting --direct=1 --eta=always 
> --ramp_time=10 --thinktime=10
>
> Use ceph client then create 1T RBD image for testing, the client also 
> has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04
>
> Performance result:
>                       Bandwidth (MB/sec) 
> ┌────────────────────────────────────────
> │storage server number│Sequential Read │Sequential Write│Random 
> Read│Random Write │ ├───────── ┼──────────────────────────────
> │          1        │      259     │     76       │    837    │    26       │
> ├───────── ┼──────────────────────────────
> │          2        │      349     │    121       │    950    │    45       │
> ├───────── ┼──────────────────────────────
> │          3        │      354     │    108       │    490    │    71       │
> ├───────── ┼──────────────────────────────
> │          4        │      338     │    103       │    610    │    89       │
> ├───────── ┼──────────────────────────────
>
> We expect that bandwidth will increase when storage server increase under all case, but the result is not !!
> Can you share your idea for read/write bandwidth when storage server increasing ?

> There's a bunch of stuff that could be weird here. Is your switch 
> capable of handling all the traffic going over it? Have you 
> benchmarked the drives and filesystems on each node individually to 
> make sure they all have the same behavior, or are some of your 
> additions slower than the others? (My money is on you having some slow 
> drives that are dragging everything down.)

Okay, I will go to re-check the setting of each storage server, but I still interested want to know the correct trend (Seq R/W and Random R/W) when storage server increasing, or you have similar experiment can share the result?

Thanks !! 

> In another case, we fixed use 4 storage servers then adjust the number 
> of replicas 2 to 4
>
> Performance result:
>
>                         Bandwidth (MB/sec) 
> ┌────────────────────────────────────────
> │  replicas number    │Sequential Read │Sequential Write│Random Read│Random Write │
> ├───────── ┼──────────────────────────────
> │          2        │       338    │      103     │     614    │      89     │
> ├───────── ┼──────────────────────────────
> │          3        │       337    │      76      │     791    │      62     │
> ├───────── ┼──────────────────────────────
> │          4        │       337    │      60      │     754    │      43     │
> ├───────── ┼──────────────────────────────
>
> The bandwidth of write will decrease when replicas increase that is easy to know, but why read bandwidth did not increase?

> Reads are always served from the "primary" OSD, but even if they 
> weren't, you distribute the same number of reads over the same number 
> of disks no matter how many replicas you have of each individual data 
> block...
> But in particular the change in random read values that you're seeing 
> indicates that your data is very noisy — I'm not sure I'd trust any of 
> the values you're seeing, especially the weirder trends. It might be 
> all noise and no real data value.
> -Greg
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f