Re: Extremely slow small files rewrite performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ouch, I think client log is missing.
Here it goes:
https://www.dropbox.com/s/650mjim2ldusr66/ceph-client.admin.log.gz?dl=0

On Tue, Oct 21, 2014 at 3:22 PM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote:
> I enabled logging and performed same tests.
> Here is the link on archive with logs, they are only from one node
> (from the node where active MDS was sitting):
> https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0
>
> Rados bench results:
>
> # rados bench -p test 10 write
>  Maintaining 16 concurrent writes of 4194304 bytes for up to 10
> seconds or 0 objects
>  Object prefix: benchmark_data_atl-fs11_4630
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      16        46        30   119.967       120  0.201327  0.348463
>      2      16        88        72   143.969       168  0.132983  0.353677
>      3      16       124       108   143.972       144  0.930837  0.383018
>      4      16       155       139   138.976       124  0.899468  0.426396
>      5      16       203       187   149.575       192  0.236534  0.400806
>      6      16       243       227   151.309       160  0.835213  0.397673
>      7      16       276       260   148.549       132  0.905989  0.406849
>      8      16       306       290   144.978       120  0.353279  0.422106
>      9      16       335       319   141.757       116   1.12114  0.428268
>     10      16       376       360    143.98       164  0.418921   0.43351
>     11      16       377       361   131.254         4  0.499769  0.433693
>  Total time run:         11.206306
> Total writes made:      377
> Write size:             4194304
> Bandwidth (MB/sec):     134.567
>
> Stddev Bandwidth:       60.0232
> Max bandwidth (MB/sec): 192
> Min bandwidth (MB/sec): 0
> Average Latency:        0.474923
> Stddev Latency:         0.376038
> Max latency:            1.82171
> Min latency:            0.060877
>
>
> # rados bench -p test 10 seq
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      16        61        45   179.957       180  0.010405   0.25243
>      2      16       109        93   185.962       192  0.908263  0.284303
>      3      16       151       135   179.965       168  0.255312  0.297283
>      4      16       191       175    174.97       160  0.836727  0.330659
>      5      16       236       220   175.971       180  0.009995  0.330832
>      6      16       275       259   172.639       156   1.06855  0.345418
>      7      16       311       295   168.545       144  0.907648  0.361689
>      8      16       351       335   167.474       160  0.947688  0.363552
>      9      16       390       374   166.196       156  0.140539  0.369057
>  Total time run:        9.755367
> Total reads made:     401
> Read size:            4194304
> Bandwidth (MB/sec):    164.422
>
> Average Latency:       0.387705
> Max latency:           1.33852
> Min latency:           0.008064
>
> # rados bench -p test 10 rand
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
>      0       0         0         0         0         0         -         0
>      1      16        55        39   155.938       156  0.773716  0.257267
>      2      16        93        77   153.957       152  0.006573  0.339199
>      3      16       135       119   158.629       168  0.009851  0.359675
>      4      16       171       155   154.967       144  0.892027  0.359015
>      5      16       209       193   154.369       152   1.13945  0.378618
>      6      16       256       240    159.97       188  0.009965  0.368439
>      7      16       295       279     159.4       156  0.195812  0.371259
>      8      16       343       327   163.472       192  0.880587  0.370759
>      9      16       380       364    161.75       148  0.113111  0.377983
>     10      16       424       408   163.173       176  0.772274  0.379497
>  Total time run:        10.518482
> Total reads made:     425
> Read size:            4194304
> Bandwidth (MB/sec):    161.620
>
> Average Latency:       0.393978
> Max latency:           1.36572
> Min latency:           0.006448
>
> On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> Can you enable debugging on the client ("debug ms = 1", "debug client
>> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test
>> again, and post them somewhere for me to look at?
>>
>> While you're at it, can you try rados bench and see what sort of
>> results you get?
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote:
>>> It is CephFS mounted via ceph-fuse.
>>> I am getting the same results not depending on how many other clients
>>> are having this fs mounted and their activity.
>>> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.
>>>
>>> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>> Are these tests conducted using a local fs on RBD, or using CephFS?
>>>> If CephFS, do you have multiple clients mounting the FS, and what are
>>>> they doing? What client (kernel or ceph-fuse)?
>>>> -Greg
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>
>>>>
>>>> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote:
>>>>> Hi
>>>>>
>>>>> I just built a new cluster using this quickstart instructions:
>>>>> http://ceph.com/docs/master/start/
>>>>>
>>>>> And here is what I am seeing:
>>>>>
>>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>>> real 0m0.081s
>>>>> user 0m0.000s
>>>>> sys 0m0.004s
>>>>>
>>>>> And if I try to repeat the same command (when files already created):
>>>>>
>>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>>> real 0m48.894s
>>>>> user 0m0.000s
>>>>> sys 0m0.004s
>>>>>
>>>>> I was very surprised and then just tried to rewrite a single file:
>>>>>
>>>>> # time echo 1 > 1.txt
>>>>> real 0m3.133s
>>>>> user 0m0.000s
>>>>> sys 0m0.000s
>>>>>
>>>>> BTW, I dont think it is the problem with OSD speed or network:
>>>>>
>>>>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G
>>>>> --file-test-mode=rndrw prepare
>>>>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec).
>>>>>
>>>>> Here is my ceph cluster status and verion:
>>>>>
>>>>> # ceph -w
>>>>>     cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb
>>>>>      health HEALTH_OK
>>>>>      monmap e4: 4 mons at
>>>>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0},
>>>>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12
>>>>>      mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby
>>>>>      osdmap e92: 4 osds: 4 up, 4 in
>>>>>       pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects
>>>>>             881 GB used, 1683 GB / 2564 GB avail
>>>>>                  192 active+clean
>>>>>   client io 1820 B/s wr, 1 op/s
>>>>>
>>>>> # ceph -v
>>>>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>>>>
>>>>> All nodes connected with gigabit network.
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux