Re: read performance VS network usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And to reply to myslef…

 

The client apparent network bandwidth is just the fact that dstat aggregates the bridge network interface and the physical interface, thus doubling the data…

 

Ah ah ah.

Regards

 

De : ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de SCHAER Frederic
Envoyé : vendredi 24 avril 2015 10:26
À : ceph-users@xxxxxxxxxxxxxx
Objet : [PROVENANCE INTERNET] Re: read performance VS network usage

 

OK, I must learn how to read dstat…

I took the recv column for the send column…

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--

usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw

15  22  43  16   0   4| 343M 7916k| 252M  659M|   0     0 |  78k  122k

15  18  45  18   0   4| 368M 4500k| 271M  592M|   0     0 |  82k  138k

(…)

 

I also notice that I see less network throughput with an MTU=9000.

So… conclusion : the nodes indeed receive part of the data and send it back to the client (even with 4MB reads, if the bench takes the option).

 

My last surprise is with the clients :

usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw

  2   1  97   0   0   0| 718B  116k|   0     0 |   0     0 |1947  3148

12  14  72   0   0   1|   0    28k| 764M 1910k|   0     0 |  25k   27k

11  13  75   0   0   1|   0  4096B| 758M 1860k|   0     0 |  25k   27k

13  14  71   0   0   1|   0  4096B| 785M 1815k|   0     0 |  25k   24k

12  14  73   0   0   1|   0     0 | 839M 1960k|   0     0 |  25k   25k

12  14  72   0   0   2|   0   548k| 782M 1873k|   0     0 |  24k   25k

11  14  73   0   0   1|   0    44k| 782M 1924k|   0     0 |  25k   26k

 

 

They are also receiving much more data than what rados bench reports (around 275MB/s each)… would that be some sort of data amplification ??

 

Regards

 

De : ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de SCHAER Frederic
Envoyé : vendredi 24 avril 2015 10:03
À : Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Objet : [PROVENANCE INTERNET] Re: read performance VS network usage

 

Hi Nick,

 

Thanks for your explanation.

I have some doubts this is what’s happening, but I’m going to first check what happens with disks IO with a clean pool and clean bench data (discarding any existing cache…)

 

I’m using the following commands for creating the bench data (and benching writes) on all 5 clients :

rados -k ceph.client.admin.keyring -p testec bench 60 write -b 4194304 -t 16 --run-name "bench_`hostname -s`" --no-cleanup

 

Replace “write” with seq for the read bench.

As you can see, I do specify the –b option, even though I’m wondering if this one affects the read bench, the help seems unclear to me:

-b op_size set the size of write ops for put or benchmarking

 

Still, even if it didn’t work and if rados bench reads were issuing 4kb reads, how could this explain that all 5 servers receive 800MiB/s (and not megabits… ) each, and that they only send on the average what each client receives ?

Where would the extra ~400MiB (not bits) come from ?

If the OSDs were reconstructing data using the other hosts data before sending that to the client, this would mean the OSD hosts would send much more data to their neighbor OSDs on the network than my average client throughput –and not roughly the same amount-, wouldn’t it ?

I took a look at the network interfaces, hoping this would come from localhost, but this did not : this came in from the physical network interface…

 

Still trying to understand ;)

 

Regards

 

De : Nick Fisk [mailto:nick@xxxxxxxxxx]
Envoyé : jeudi 23 avril 2015 17:21
À : SCHAER Frederic; ceph-users@xxxxxxxxxxxxxx
Objet : RE: read performance VS network usage

 

Hi Frederic,

 

If you are using EC pools, the primary OSD requests the remaining shards of the object from the other OSD’s, reassembles it and then sends the data to the client. The entire object needs to be reconstructed even for a small IO operation, so 4kb reads could lead to quite a large IO amplification if you are using the default 4MB object sizes. I believe this is what you are seeing, although creating a RBD with smaller object sizes can help reduce this.

 

Nick

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of SCHAER Frederic
Sent: 23 April 2015 15:40
To: ceph-users@xxxxxxxxxxxxxx
Subject: read performance VS network usage

 

Hi again,

 

On my testbed, I have 5 ceph nodes, each containing 23 OSDs (2TB btrfs drives). For these tests, I’ve setup a RAID0 on the 23 disks.

For now, I’m not using SSDs as I discovered my vendor apparently decreased their perfs on purpose…

 

So : 5 server nodes of which 3 are MONS too.

I also have 5 clients.

All of them have a single 10G NIC,  I’m not using a private network.

I’m testing EC pools, with the failure domain set to hosts.

The EC pool k/m is set to k=4/m=1

I’m testing EC pools using the giant release (ceph-0.87.1-0.el7.centos.x86_64)

 

And… I just found out I had “limited” read performance.

While I was watching the stats using dstat on one server node, I noticed that during the rados (read) bench, all the server nodes sent about 370MiB/s on the network, which is the average speed I get per server, but they also all received about 750-800MiB/s on that same network. And 800MB/s is about as much as you can get on a 10G link…

 

I’m trying to understand why I see this inbound data flow ?

-          Why does a server node receive data at all during a read bench ?

-          Why is it about twice as much as the data the node is sending ?

-          Is this about verifying data integrity at read time ?

 

I’m alone on the cluster, it’s not used anywhere else.

I will try tomorrow to see if adding a 2nd 10G port (with a private network this time) improves the performance, but I’m really curious here to understand what’s the bottleneck and what’s ceph doing… ?

 

Looking at the write performance, I see the same kind of behavior : nodes send about half the amount of data they receive (600MB/300MB), but this might be because this time the client only sends the real data and the erasure coding happens behind the scenes (or not ?)

 

Any idea ?

 

Regards

Frederic


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux