Re: Extremely slow small files rewrite performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I enabled logging and performed same tests.
Here is the link on archive with logs, they are only from one node
(from the node where active MDS was sitting):
https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0

Rados bench results:

# rados bench -p test 10 write
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
 Object prefix: benchmark_data_atl-fs11_4630
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        46        30   119.967       120  0.201327  0.348463
     2      16        88        72   143.969       168  0.132983  0.353677
     3      16       124       108   143.972       144  0.930837  0.383018
     4      16       155       139   138.976       124  0.899468  0.426396
     5      16       203       187   149.575       192  0.236534  0.400806
     6      16       243       227   151.309       160  0.835213  0.397673
     7      16       276       260   148.549       132  0.905989  0.406849
     8      16       306       290   144.978       120  0.353279  0.422106
     9      16       335       319   141.757       116   1.12114  0.428268
    10      16       376       360    143.98       164  0.418921   0.43351
    11      16       377       361   131.254         4  0.499769  0.433693
 Total time run:         11.206306
Total writes made:      377
Write size:             4194304
Bandwidth (MB/sec):     134.567

Stddev Bandwidth:       60.0232
Max bandwidth (MB/sec): 192
Min bandwidth (MB/sec): 0
Average Latency:        0.474923
Stddev Latency:         0.376038
Max latency:            1.82171
Min latency:            0.060877


# rados bench -p test 10 seq
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        61        45   179.957       180  0.010405   0.25243
     2      16       109        93   185.962       192  0.908263  0.284303
     3      16       151       135   179.965       168  0.255312  0.297283
     4      16       191       175    174.97       160  0.836727  0.330659
     5      16       236       220   175.971       180  0.009995  0.330832
     6      16       275       259   172.639       156   1.06855  0.345418
     7      16       311       295   168.545       144  0.907648  0.361689
     8      16       351       335   167.474       160  0.947688  0.363552
     9      16       390       374   166.196       156  0.140539  0.369057
 Total time run:        9.755367
Total reads made:     401
Read size:            4194304
Bandwidth (MB/sec):    164.422

Average Latency:       0.387705
Max latency:           1.33852
Min latency:           0.008064

# rados bench -p test 10 rand
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        55        39   155.938       156  0.773716  0.257267
     2      16        93        77   153.957       152  0.006573  0.339199
     3      16       135       119   158.629       168  0.009851  0.359675
     4      16       171       155   154.967       144  0.892027  0.359015
     5      16       209       193   154.369       152   1.13945  0.378618
     6      16       256       240    159.97       188  0.009965  0.368439
     7      16       295       279     159.4       156  0.195812  0.371259
     8      16       343       327   163.472       192  0.880587  0.370759
     9      16       380       364    161.75       148  0.113111  0.377983
    10      16       424       408   163.173       176  0.772274  0.379497
 Total time run:        10.518482
Total reads made:     425
Read size:            4194304
Bandwidth (MB/sec):    161.620

Average Latency:       0.393978
Max latency:           1.36572
Min latency:           0.006448

On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> Can you enable debugging on the client ("debug ms = 1", "debug client
> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test
> again, and post them somewhere for me to look at?
>
> While you're at it, can you try rados bench and see what sort of
> results you get?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote:
>> It is CephFS mounted via ceph-fuse.
>> I am getting the same results not depending on how many other clients
>> are having this fs mounted and their activity.
>> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.
>>
>> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> Are these tests conducted using a local fs on RBD, or using CephFS?
>>> If CephFS, do you have multiple clients mounting the FS, and what are
>>> they doing? What client (kernel or ceph-fuse)?
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov <natarajaya@xxxxxxxxx> wrote:
>>>> Hi
>>>>
>>>> I just built a new cluster using this quickstart instructions:
>>>> http://ceph.com/docs/master/start/
>>>>
>>>> And here is what I am seeing:
>>>>
>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>> real 0m0.081s
>>>> user 0m0.000s
>>>> sys 0m0.004s
>>>>
>>>> And if I try to repeat the same command (when files already created):
>>>>
>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>> real 0m48.894s
>>>> user 0m0.000s
>>>> sys 0m0.004s
>>>>
>>>> I was very surprised and then just tried to rewrite a single file:
>>>>
>>>> # time echo 1 > 1.txt
>>>> real 0m3.133s
>>>> user 0m0.000s
>>>> sys 0m0.000s
>>>>
>>>> BTW, I dont think it is the problem with OSD speed or network:
>>>>
>>>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G
>>>> --file-test-mode=rndrw prepare
>>>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec).
>>>>
>>>> Here is my ceph cluster status and verion:
>>>>
>>>> # ceph -w
>>>>     cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb
>>>>      health HEALTH_OK
>>>>      monmap e4: 4 mons at
>>>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0},
>>>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12
>>>>      mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby
>>>>      osdmap e92: 4 osds: 4 up, 4 in
>>>>       pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects
>>>>             881 GB used, 1683 GB / 2564 GB avail
>>>>                  192 active+clean
>>>>   client io 1820 B/s wr, 1 op/s
>>>>
>>>> # ceph -v
>>>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>>>
>>>> All nodes connected with gigabit network.
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux