On Fri, Oct 24, 2014 at 8:47 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: > Any update? > The short answer is that when the command is executed for second time, the MDS needs to truncate the file zero length. The speed of truncate a file is limited by the OSD speed. (creating file and write data to the file are async operations, but truncating a file is sync operation) Regards Yan, Zheng > On Tue, Oct 21, 2014 at 3:32 PM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >> Ouch, I think client log is missing. >> Here it goes: >> https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0 >> >> On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >>> I enabled logging and performed same tests. >>> Here is the link on archive with logs, they are only from one node >>> (from the node where active MDS was sitting): >>> https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0 >>> >>> Rados bench results: >>> >>> # rados bench -p test 10 write >>> Maintaining 16 concurrent writes of 4194304 bytes for up to 10 >>> seconds or 0 objects >>> Object prefix: benchmark_data_atl-fs11_4630 >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 16 46 30 119.967 120 0.201327 0.348463 >>> 2 16 88 72 143.969 168 0.132983 0.353677 >>> 3 16 124 108 143.972 144 0.930837 0.383018 >>> 4 16 155 139 138.976 124 0.899468 0.426396 >>> 5 16 203 187 149.575 192 0.236534 0.400806 >>> 6 16 243 227 151.309 160 0.835213 0.397673 >>> 7 16 276 260 148.549 132 0.905989 0.406849 >>> 8 16 306 290 144.978 120 0.353279 0.422106 >>> 9 16 335 319 141.757 116 1.12114 0.428268 >>> 10 16 376 360 143.98 164 0.418921 0.43351 >>> 11 16 377 361 131.254 4 0.499769 0.433693 >>> Total time run: 11.206306 >>> Total writes made: 377 >>> Write size: 4194304 >>> Bandwidth (MB/sec): 134.567 >>> >>> Stddev Bandwidth: 60.0232 >>> Max bandwidth (MB/sec): 192 >>> Min bandwidth (MB/sec): 0 >>> Average Latency: 0.474923 >>> Stddev Latency: 0.376038 >>> Max latency: 1.82171 >>> Min latency: 0.060877 >>> >>> >>> # rados bench -p test 10 seq >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 16 61 45 179.957 180 0.010405 0.25243 >>> 2 16 109 93 185.962 192 0.908263 0.284303 >>> 3 16 151 135 179.965 168 0.255312 0.297283 >>> 4 16 191 175 174.97 160 0.836727 0.330659 >>> 5 16 236 220 175.971 180 0.009995 0.330832 >>> 6 16 275 259 172.639 156 1.06855 0.345418 >>> 7 16 311 295 168.545 144 0.907648 0.361689 >>> 8 16 351 335 167.474 160 0.947688 0.363552 >>> 9 16 390 374 166.196 156 0.140539 0.369057 >>> Total time run: 9.755367 >>> Total reads made: 401 >>> Read size: 4194304 >>> Bandwidth (MB/sec): 164.422 >>> >>> Average Latency: 0.387705 >>> Max latency: 1.33852 >>> Min latency: 0.008064 >>> >>> # rados bench -p test 10 rand >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 16 55 39 155.938 156 0.773716 0.257267 >>> 2 16 93 77 153.957 152 0.006573 0.339199 >>> 3 16 135 119 158.629 168 0.009851 0.359675 >>> 4 16 171 155 154.967 144 0.892027 0.359015 >>> 5 16 209 193 154.369 152 1.13945 0.378618 >>> 6 16 256 240 159.97 188 0.009965 0.368439 >>> 7 16 295 279 159.4 156 0.195812 0.371259 >>> 8 16 343 327 163.472 192 0.880587 0.370759 >>> 9 16 380 364 161.75 148 0.113111 0.377983 >>> 10 16 424 408 163.173 176 0.772274 0.379497 >>> Total time run: 10.518482 >>> Total reads made: 425 >>> Read size: 4194304 >>> Bandwidth (MB/sec): 161.620 >>> >>> Average Latency: 0.393978 >>> Max latency: 1.36572 >>> Min latency: 0.006448 >>> >>> On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>> Can you enable debugging on the client ("debug ms = 1", "debug client >>>> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test >>>> again, and post them somewhere for me to look at? >>>> >>>> While you're at it, can you try rados bench and see what sort of >>>> results you get? >>>> -Greg >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>> >>>> >>>> On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >>>>> It is CephFS mounted via ceph-fuse. >>>>> I am getting the same results not depending on how many other clients >>>>> are having this fs mounted and their activity. >>>>> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64. >>>>> >>>>> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >>>>>> Are these tests conducted using a local fs on RBD, or using CephFS? >>>>>> If CephFS, do you have multiple clients mounting the FS, and what are >>>>>> they doing? What client (kernel or ceph-fuse)? >>>>>> -Greg >>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>> >>>>>> >>>>>> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote: >>>>>>> Hi >>>>>>> >>>>>>> I just built a new cluster using this quickstart instructions: >>>>>>> http://ceph.com/docs/master/start/ >>>>>>> >>>>>>> And here is what I am seeing: >>>>>>> >>>>>>> # time for i in {1..10}; do echo $i > $i.txt ; done >>>>>>> real 0m0.081s >>>>>>> user 0m0.000s >>>>>>> sys 0m0.004s >>>>>>> >>>>>>> And if I try to repeat the same command (when files already created): >>>>>>> >>>>>>> # time for i in {1..10}; do echo $i > $i.txt ; done >>>>>>> real 0m48.894s >>>>>>> user 0m0.000s >>>>>>> sys 0m0.004s >>>>>>> >>>>>>> I was very surprised and then just tried to rewrite a single file: >>>>>>> >>>>>>> # time echo 1 > 1.txt >>>>>>> real 0m3.133s >>>>>>> user 0m0.000s >>>>>>> sys 0m0.000s >>>>>>> >>>>>>> BTW, I dont think it is the problem with OSD speed or network: >>>>>>> >>>>>>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G >>>>>>> --file-test-mode=rndrw prepare >>>>>>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec). >>>>>>> >>>>>>> Here is my ceph cluster status and verion: >>>>>>> >>>>>>> # ceph -w >>>>>>> cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb >>>>>>> health HEALTH_OK >>>>>>> monmap e4: 4 mons at >>>>>>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0}, >>>>>>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12 >>>>>>> mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby >>>>>>> osdmap e92: 4 osds: 4 up, 4 in >>>>>>> pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects >>>>>>> 881 GB used, 1683 GB / 2564 GB avail >>>>>>> 192 active+clean >>>>>>> client io 1820 B/s wr, 1 op/s >>>>>>> >>>>>>> # ceph -v >>>>>>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) >>>>>>> >>>>>>> All nodes connected with gigabit network. >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com