Re: Why is bandwidth not fully saturated?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Sun, 5 Feb 2017 02:03:57 +0100 Marc Roos wrote:

>  
> 
> I have a 3 node test cluster with one osd per node. And I write a file 
> to a pool with size 1. Why doesn’t ceph just use the full 110MB/s of 
> the network (as with default rados bench test)? Does ceph 'reserve' 
> bandwidth for other concurrent connections? Can this be tuned?
> 

Firstly, benchmarking from within the cluster will always give you skewed
results, typically better than what a client would see since some actions
will be local.

Secondly, you're not telling us much about the cluster, but I'll assume
these are plain drives with in-line journals. 
Same about your rados bench test run, so I'll assume defaults there.

What you're seeing here is with near certainty the difference between a
single client (your "put") and multiple ones (rados bench does 16 threads
by default).

So rados gets to distribute all writes amongst all OSDs and is able to
saturate things, while you put has to wait for a single OSD and the
latency of the network.
A single SATA HDD typically can do about 150MB/s writes, but that would be
sequential ones, which RBD isn't. The journal takes half of that, the FS
journals and overhead some more, so about 40MB/s effective performance
doesn't suprise me at all.

Run this test with atop or iostat active on all machines to confirm.


As for your LACP question, it is what it is.
With a sufficient number of real clients and OSDs things will get
distributed eventually, but a single client will never get more than one
link worth. 
Typically people find that while bandwidth certainly is desirable (up to a
point), it is the lower latency of faster links like 10/25/40Gbs Ethernet 
or Infiniband that makes their clients (VMs) happier. 

Christian

> Putting from ram drive on first node
> time rados -p data1 put test3.img test3.img
> 
> --net/eth0----net/eth1- --dsk/sda-----dsk/sdb-----dsk/sdc--
>  recv  send: recv  send| read  writ: read  writ: read  writ
> 2440B 4381B:1973B    0 |   0     0 :   0     0 :   0     0
> 1036B 3055B:1958B  124B|   0     0 :   0     0 :   0     0
> 1382B 3277B:1316B    0 |   0     0 :   0     0 :   0     0
> 1227B 2850B:1243B    0 |   0     0 :   0     0 :   0     0
> 1216B  120k:2300B    0 |   0     0 :   0     0 :   0     0
> 1714B 8257k:  15k    0 |   0  4096B:   0     0 :   0     0
> 1006B   24M:  40k    0 |   0    14k:   0     0 :   0     0
> 1589B   36M:  58k    0 |   0    32k:   0     0 :   0     0
>  856B   36M:  57k    0 |   0     0 :   0     0 :   0     0
>  856B   40M:  64k    0 |   0     0 :   0     0 :   0     0
> 2031B   36M:  58k    0 |   0     0 :   0     0 :   0     0
>  865B   36M:  58k    0 |   0    24k:   0     0 :   0     0
> 1713B   39M:  61k    0 |   0    37k:   0     0 :   0     0
>  997B   38M:  59k    0 |   0     0 :   0     0 :   0     0
>   66B   36M:  58k    0 |   0     0 :   0     0 :   0     0
> 1782B   36M:  57k    0 |   0     0 :   0     0 :   0     0
>  931B   36M:  58k    0 |   0  8192B:   0     0 :   0     0
>  931B   36M:  57k    0 |   0    45k:   0     0 :   0     0
>  724B   36M:  57k    0 |   0     0 :   0     0 :   0     0
>  922B   28M:  47k    0 |   0     0 :   0     0 :   0     0
> 2506B 4117B:2261B    0 |   0     0 :   0     0 :   0     0
>  865B 7630B:2631B    0 |   0    15k:   0     0 :   0     0
> 
> 
> Goes to 3rd node
> 
> --net/eth0----net/eth1- --dsk/sda-----dsk/sdb-----dsk/sdc-----dsk/sdd--
>  recv  send: recv  send| read  writ: read  writ: read  writ: read  writ
>   66B 1568B: 733B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 2723B 4979B:1469B    0 |   0     0 :   0     0 :   0     0 :   0     0
>   66B 1761B:1347B    0 |   0     0 :   0     0 :   0     0 :   0     0
>   66B 2480B: 119k    0 |   0     0 :   0     0 :   0     0 :   0     0
>  103k   22k:  12M    0 |   0  4096B:   0     0 :   0    12M:   0     0
>  784B   41k:  24M    0 |   0    17k:   0     0 :   0    24M:   0     0
> 1266B   63k:  38M    0 |   0    40k:   0     0 :   0    37M:   0     0
>   66B   60k:  39M    0 |   0     0 :   0     0 :   0    41M:   0     0
>   66B   60k:  37M    0 |   0     0 :   0     0 :   0    38M:   0     0
>  104k   62k:  38M    0 |   0     0 :   0     0 :   0    38M:   0     0
>   66B   61k:  38M    0 |   0    15k:   0     0 :   0    42M:   0     0
> 1209B   59k:  36M    0 |   0    44k:   0     0 :   0    39M:   0     0
>   66B   60k:  38M    0 |   0    87k:   0     0 :   0    39M:   0     0
>  980B   62k:  38M    0 |   0     0 :   0     0 :   0    41M:   0     0
>  103k   52k:  32M    0 |   0  4096B:   0     0 :  60k   42M:   0     0
>   66B   61k:  38M    0 |   0  8192B:   0     0 :   0    40M:   0     0
>  476B   58k:  36M    0 |   0    45k:   0     0 :   0    41M:   0     0
> 1514B   55k:  34M    0 |   0     0 :   0     0 :8192B   41M:   0     0
>  856B   42k:  24M    0 |   0     0 :   0     0 :   0    28M:   0     0
>  103k 3010B:1681B    0 |   0     0 :   0     0 :   0     0 :   0     0
>  126B 3363B:4187B    0 |   0    15k:   0     0 :   0     0 :   0     0
>  551B 2714B:  22k    0 |   0    40k:   0     0 :   0     0 :   0     0
>  724B 2073B: 919B    0 |   0     0 :   0     0 :   0     0 :   0     0
>   66B 2044B: 118k    0 |   0     0 :   0     0 :   0     0 :   0     0 
> 
> 2nd node does nothing, as expected
> 
> --net/eth0----net/eth1- --dsk/sda-----dsk/sdb-----dsk/sdc-----dsk/sdd--
>  recv  send: recv  send| read  writ: read  writ: read  writ: read  writ
>  348B  104k:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 4326B 3132B: 294B    0 |   0    14k:   0     0 :   0     0 :   0     0
>   24k 3783B:  66B    0 |   0    40k:   0     0 :   0     0 :   0     0
>  658B 1258B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
>  658B 1044B: 190B    0 |   0     0 :   0     0 :   0     0 :   0     0
>  926B  105k:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 4891B 3197B:  66B    0 |   0    27k:   0     0 :   0     0 :   0     0
>   22k 1752B:  66B    0 |   0    44k:   0     0 :   0     0 :   0     0
> 1655B 1925B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 1440B 1966B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
>  710B  104k:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 4186B 2539B:  66B    0 |   0    16k:   0     0 :   0     0 :   0     0
>   22k 1677B:  66B    0 |   0    45k:   0     0 :   0     0 :   0     0
>  658B 1258B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 2896B 3018B:  66B  124B|   0     0 :   0     0 :   0     0 :   0     0
>  710B  104k:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 4973B 3251B:  66B    0 |   0    23k:   0     0 :   0     0 :   0     0
>   22k 1752B:  66B    0 |   0    41k:   0     0 :   0     0 :   0     0
>  733B 1267B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 1580B 1900B:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> 1368B  105k:  66B    0 |   0     0 :   0     0 :   0     0 :   0     0
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux