I enabled logging and performed same tests. Here is the link on archive with logs, they are only from one node (from the node where active MDS was sitting): https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 Rados bench results: # rados bench -p test 10 write Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects Object prefix: benchmark_data_atl-fs11_4630 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 46 30 119.967 120 0.201327 0.348463 2 16 88 72 143.969 168 0.132983 0.353677 3 16 124 108 143.972 144 0.930837 0.383018 4 16 155 139 138.976 124 0.899468 0.426396 5 16 203 187 149.575 192 0.236534 0.400806 6 16 243 227 151.309 160 0.835213 0.397673 7 16 276 260 148.549 132 0.905989 0.406849 8 16 306 290 144.978 120 0.353279 0.422106 9 16 335 319 141.757 116 1.12114 0.428268 10 16 376 360 143.98 164 0.418921 0.43351 11 16 377 361 131.254 4 0.499769 0.433693 Total time run: 11.206306 Total writes made: 377 Write size: 4194304 Bandwidth (MB/sec): 134.567 Stddev Bandwidth: 60.0232 Max bandwidth (MB/sec): 192 Min bandwidth (MB/sec): 0 Average Latency: 0.474923 Stddev Latency: 0.376038 Max latency: 1.82171 Min latency: 0.060877 # rados bench -p test 10 seq sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 61 45 179.957 180 0.010405 0.25243 2 16 109 93 185.962 192 0.908263 0.284303 3 16 151 135 179.965 168 0.255312 0.297283 4 16 191 175 174.97 160 0.836727 0.330659 5 16 236 220 175.971 180 0.009995 0.330832 6 16 275 259 172.639 156 1.06855 0.345418 7 16 311 295 168.545 144 0.907648 0.361689 8 16 351 335 167.474 160 0.947688 0.363552 9 16 390 374 166.196 156 0.140539 0.369057 Total time run: 9.755367 Total reads made: 401 Read size: 4194304 Bandwidth (MB/sec): 164.422 Average Latency: 0.387705 Max latency: 1.33852 Min latency: 0.008064 # rados bench -p test 10 rand sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 0 0 0 0 0 - 0 1 16 55 39 155.938 156 0.773716 0.257267 2 16 93 77 153.957 152 0.006573 0.339199 3 16 135 119 158.629 168 0.009851 0.359675 4 16 171 155 154.967 144 0.892027 0.359015 5 16 209 193 154.369 152 1.13945 0.378618 6 16 256 240 159.97 188 0.009965 0.368439 7 16 295 279 159.4 156 0.195812 0.371259 8 16 343 327 163.472 192 0.880587 0.370759 9 16 380 364 161.75 148 0.113111 0.377983 10 16 424 408 163.173 176 0.772274 0.379497 Total time run: 10.518482 Total reads made: 425 Read size: 4194304 Bandwidth (MB/sec): 161.620 Average Latency: 0.393978 Max latency: 1.36572 Min latency: 0.006448 On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > Can you enable debugging on the client ("debug ms = 1", "debug client > = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test > again, and post them somewhere for me to look at? > > While you're at it, can you try rados bench and see what sort of > results you get? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >> It is CephFS mounted via ceph-fuse. >> I am getting the same results not depending on how many other clients >> are having this fs mounted and their activity. >> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64. >> >> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>> Are these tests conducted using a local fs on RBD, or using CephFS? >>> If CephFS, do you have multiple clients mounting the FS, and what are >>> they doing? What client (kernel or ceph-fuse)? >>> -Greg >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >>> >>> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >>>> Hi >>>> >>>> I just built a new cluster using this quickstart instructions: >>>> http://ceph.com/docs/master/start/ >>>> >>>> And here is what I am seeing: >>>> >>>> # time for i in {1..10}; do echo $i > $i.txt ; done >>>> real 0m0.081s >>>> user 0m0.000s >>>> sys 0m0.004s >>>> >>>> And if I try to repeat the same command (when files already created): >>>> >>>> # time for i in {1..10}; do echo $i > $i.txt ; done >>>> real 0m48.894s >>>> user 0m0.000s >>>> sys 0m0.004s >>>> >>>> I was very surprised and then just tried to rewrite a single file: >>>> >>>> # time echo 1 > 1.txt >>>> real 0m3.133s >>>> user 0m0.000s >>>> sys 0m0.000s >>>> >>>> BTW, I dont think it is the problem with OSD speed or network: >>>> >>>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G >>>> --file-test-mode=rndrw prepare >>>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec). >>>> >>>> Here is my ceph cluster status and verion: >>>> >>>> # ceph -w >>>> cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb >>>> health HEALTH_OK >>>> monmap e4: 4 mons at >>>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0}, >>>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12 >>>> mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby >>>> osdmap e92: 4 osds: 4 up, 4 in >>>> pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects >>>> 881 GB used, 1683 GB / 2564 GB avail >>>> 192 active+clean >>>> client io 1820 B/s wr, 1 op/s >>>> >>>> # ceph -v >>>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) >>>> >>>> All nodes connected with gigabit network. >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com