Re: Ceph scalar & replicas performance

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 22 Feb 2013 09:57:21 -0800

On Thu, Feb 21, 2013 at 5:01 PM,  <Kelvin_Huang@xxxxxxxxxx> wrote:
> Hi all,
> I have some problem after my scalar performance test !!
>
> Setup:
> Linux kernel: 3.2.0
> OS: Ubuntu 12.04
> Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 10GbE NIC + RAID card: LSI MegaRAID SAS 9260-4i
>          For every HDD: RAID0, Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct Storage server number : 1 to 4
>
> Ceph version : 0.48.2
> Replicas : 2
>
> FIO cmd:
> [Sequencial Read]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = read --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10
>
> [Sequencial Read]
> fio --iodepth = 32 --numjobs=1 --runtime=120  --bs = 65536 --rw = write --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10
>
> [Random Read]
> fio --iodepth = 32 --numjobs=8 --runtime=120  --bs = 65536 --rw = randread --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10
>
> [Random Write]
> fio --iodepth = 32 --numjobs=8 --runtime=120  --bs = 65536 --rw = randwrite --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10
>
> Use ceph client then create 1T RBD image for testing, the client also has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04
>
> Performance result:
>                       Bandwidth (MB/sec)
> ┌────────────────────────────────────────
> │storage server number│Sequential Read │Sequential Write│Random Read│Random Write │
> ├───────── ┼──────────────────────────────
> │          1        │      259     │     76       │    837    │    26       │
> ├───────── ┼──────────────────────────────
> │          2        │      349     │    121       │    950    │    45       │
> ├───────── ┼──────────────────────────────
> │          3        │      354     │    108       │    490    │    71       │
> ├───────── ┼──────────────────────────────
> │          4        │      338     │    103       │    610    │    89       │
> ├───────── ┼──────────────────────────────
>
> We expect that bandwidth will increase when storage server increase under all case, but the result is not !!
> Can you share your idea for read/write bandwidth when storage server increasing ?

There's a bunch of stuff that could be weird here. Is your switch
capable of handling all the traffic going over it? Have you
benchmarked the drives and filesystems on each node individually to
make sure they all have the same behavior, or are some of your
additions slower than the others? (My money is on you having some slow
drives that are dragging everything down.)

> In another case, we fixed use 4 storage servers then adjust the number of replicas 2 to 4
>
> Performance result:
>
>                         Bandwidth (MB/sec)
> ┌────────────────────────────────────────
> │  replicas number    │Sequential Read │Sequential Write│Random Read│Random Write │
> ├───────── ┼──────────────────────────────
> │          2        │       338    │      103     │     614    │      89     │
> ├───────── ┼──────────────────────────────
> │          3        │       337    │      76      │     791    │      62     │
> ├───────── ┼──────────────────────────────
> │          4        │       337    │      60      │     754    │      43     │
> ├───────── ┼──────────────────────────────
>
> The bandwidth of write will decrease when replicas increase that is easy to know, but why read bandwidth did not increase?

Reads are always served from the "primary" OSD, but even if they
weren't, you distribute the same number of reads over the same number
of disks no matter how many replicas you have of each individual data
block...
But in particular the change in random read values that you're seeing
indicates that your data is very noisy — I'm not sure I'd trust any of
the values you're seeing, especially the weirder trends. It might be
all noise and no real data value.
-Greg
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f