heavy network traffic between mon and osd with auth messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I deployed a ceph cluster with one monitor (172.27.27.5) and three OSDs (172.27.27.2-4). Servers are connected with 1Gbps network.
When the service start and no jobs are committed, the communication between monitor and OSD is too frequently that generates heavy network traffic (about 20MB/s per OSD, totally 60MB/s speed for the monitor with 3 OSDs). 

“ceph -s” returns good result (active + clean) and “ceph -w” shows that nothing is happening. When config the conf file with “debug ms=1”, the logs of OSD if full of messages (100M in several seconds) like:

2013-11-23 20:19:45.916950 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b56300 con 0x2b85540

2013-11-23 20:19:45.916967 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444189 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4da8a00 con 0x2b85540

2013-11-23 20:19:45.917016 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b51f80 con 0x2b85540

2013-11-23 20:19:45.917033 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444190 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4da8e00 con 0x2b85540

2013-11-23 20:19:45.917081 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b50900 con 0x2b85540

2013-11-23 20:19:45.917098 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444191 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dab200 con 0x2b85540

2013-11-23 20:19:45.917146 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x5380900 con 0x2b85540

2013-11-23 20:19:45.917163 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444192 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dab000 con 0x2b85540

2013-11-23 20:19:45.917213 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b533c0 con 0x2b85540

2013-11-23 20:19:45.917229 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444193 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4daa400 con 0x2b85540

2013-11-23 20:19:45.917278 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b54a40 con 0x2b85540

2013-11-23 20:19:45.917292 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444194 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dad800 con 0x2b85540

2013-11-23 20:19:45.917340 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b54c80 con 0x2b85540

2013-11-23 20:19:45.917361 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444195 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4daae00 con 0x2b85540

2013-11-23 20:19:45.917412 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b53cc0 con 0x2b85540

2013-11-23 20:19:45.917425 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444196 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4daac00 con 0x2b85540

2013-11-23 20:19:45.917479 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x2b56e40 con 0x2b85540

2013-11-23 20:19:45.917496 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444197 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dad600 con 0x2b85540

2013-11-23 20:19:45.917547 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3de80 con 0x2b85540

2013-11-23 20:19:45.917562 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444198 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4daa600 con 0x2b85540

2013-11-23 20:19:45.917612 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3a880 con 0x2b85540

2013-11-23 20:19:45.917631 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444199 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dab600 con 0x2b85540

2013-11-23 20:19:45.917682 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3aac0 con 0x2b85540

2013-11-23 20:19:45.917699 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444200 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dab400 con 0x2b85540

2013-11-23 20:19:45.917748 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3dc40 con 0x2b85540

2013-11-23 20:19:45.917764 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444201 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dae000 con 0x2b85540

2013-11-23 20:19:45.917814 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3da00 con 0x2b85540

2013-11-23 20:19:45.917830 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444202 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dae800 con 0x2b85540

2013-11-23 20:19:45.917879 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3a640 con 0x2b85540

2013-11-23 20:19:45.917968 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444203 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dafe00 con 0x2b85540

2013-11-23 20:19:45.918041 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3a400 con 0x2b85540

2013-11-23 20:19:45.918058 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444204 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4daee00 con 0x2b85540

2013-11-23 20:19:45.918110 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3d7c0 con 0x2b85540

2013-11-23 20:19:45.918127 7f17d27d3700  1 -- 172.27.27.4:6800/19077 <== mon.0 172.27.27.5:6789/0 444205 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (789481923 0 0) 0x4dae600 con 0x2b85540

2013-11-23 20:19:45.918176 7f17d27d3700  1 -- 172.27.27.4:6800/19077 --> 172.27.27.5:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0x4d3ec00 con 0x2b85540


And when I disabled the "auth cluster required” in conf file, the traffic disappeared. But the security maybe influenced and the radosgw did not work with errors:  error decoding block for decryption


Any clue? 
Thx~

-- 
ambling

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux