Poor performance with rbd volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

I have two ceph clusters configured this way:

production:
# cat /etc/ceph/ceph.conf
[global]
  auth supported = cephx
  keyring = /srv/ceph/keyring.admin

[mon]
  mon data = /srv/ceph/mon
  mon clock drift allowed = 0.5

[mon.n11c1]
  host = n11c1
  mon addr = 1.1.1.49:6789
[mon.cc2]
  host = cc2
  mon addr = 1.1.1.48:6789
[mon.n13c1]
  host = n13c1
  mon addr = 1.1.1.51:6789

[mds]
  debug mds = 1
  keyring = /srv/ceph/ceph-stage2/keyring.$name

[mds.n11c1]
  host = n11c1

[osd]
  osd data = /srv/ceph/$name
  osd journal = /srv/ceph/$name.journal
  osd journal size = 1000
  keyring = /srv/ceph/ceph-stage2/keyring.$name
  debug osd = 1

[osd.2]
  host = n14c1
[osd.0]
  host = n11c1
[osd.3]
  host = n13c1
[osd.1]
  host = n12c1

staging:
# cat /etc/ceph/ceph.conf
[global]
  auth supported = cephx
  keyring = /srv/ceph/keyring.admin

[mon]
  mon data = /srv/ceph/mon
  mon clock drift allowed = 0.5

[mon.cc]
  host = cc
  mon addr = 1.1.1.35:6789
[mon.n3cc]
  host = n3cc
  mon addr = 1.1.1.34:6789

[mds]
  debug mds = 1
  keyring = /srv/ceph/ceph-stage2/keyring.$name

[mds.cc]
  host = cc
[mds.n3cc]
  host = n3cc
  mds standby replay = true
  mds standby for name = cc

[osd]
  osd data = /srv/ceph/$name
  osd journal = /srv/ceph/$name.journal
  osd journal size = 1000
  keyring = /srv/ceph/ceph-stage2/keyring.$name
  debug osd = 1

[osd.0]
  host = cc
[osd.1]
  host = n3cc

I am using RBD volumes mapped on virtual machines which run postgresql
and mongodb databases. Right now there are 15 clients with postgresql
and 10 with mongodb. All clients generate at most 0.4 IOPS (both read
and write). Here are the graphs (writes per second) for nodes with
osds from last week:
https://www.dropbox.com/s/djnpcxb6a9ktzv8/ceph_stats1.png
https://www.dropbox.com/s/ak7npkhm776jarp/ceph_stats2.png
https://www.dropbox.com/s/3lzfaku1nourmle/ceph_stats3.png

Is it normal that ceph generates so much writes? In staging cluster
there are no clients connected and still there are about 40 writes/s.
I have also checked admin socket and dumped ops_in_flight. There were
0 ops.
The situation on production is also weird for me. There are a lot of
writes which do not reflect volume usage on clients. Looks like ceph
have a big overhead or more likely something is wrong with the
cluster.

Benchmarks from both clusters:

root@n12c1[production]:/srv/ceph# rados -p todo-list bench 60 write -b
4096 -t 16
 Maintaining 16 concurrent writes of 4096 bytes for at least 60 seconds.
 Object prefix: benchmark_data_n12c1_2638
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        20         4 0.0156214  0.015625  0.798329  0.696832
     2      16        20         40.00781079         0         -  0.696832
     3      16        20         40.00520727         0         -  0.696832
     4      16        20         40.00390549         0         -  0.696832
     5      16        20         40.00312442         0         -  0.696832
     6      16        20         40.00260369         0         -  0.696832
     7      16        20         40.00223175         0         -  0.696832
     8      16        20         40.00195278         0         -  0.696832
     9      16        20         40.00173581         0         -  0.696832
    10      16        20         40.00156223         0         -  0.696832
    11      16        20         40.00142021         0         -  0.696832
    12      16        20         40.00130186         0         -  0.696832
    13      16        20         40.00120171         0         -  0.696832
    14      16        20         40.00111588         0         -  0.696832
    15      16        20         40.00104149         0         -  0.696832
    16      16        20         40.000976395         0         -  0.696832
    17      16        20         40.000918959         0         -  0.696832
    18      16        20         40.000867905         0         -  0.696832
    19      16        24         80.001644450.000868056   18.9809   9.83857
2012-10-16 16:10:18.553915min lat: 0.391531 max lat: 18.9809 avg lat: 9.83857
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    20      16        24         80.00156223         0         -   9.83857
    21      16        25         90.001673810.00195312   20.6347   11.0381
    22      16        36        200.00355051 0.0429688  0.259829   14.4342
    23      16        36        200.00339614         0         -   14.4342
    24      16        36        200.00325464         0         -   14.4342
    25      16        37        210.003280680.00130208   23.8748   14.8838
    26      16        37        21 0.0031545         0         -   14.8838
    27      16        37        210.00303766         0         -   14.8838
    28      16        37        210.00292918         0         -   14.8838
    29      16        37        210.00282817         0         -   14.8838
    30      16        37        21 0.0027339         0         -   14.8838
    31      16        37        210.00264571         0         -   14.8838
    32      16        37        210.00256303         0         -   14.8838
    33      16        37        210.00248537         0         -   14.8838
    34      16        37        210.00241227         0         -   14.8838
    35      16        37        210.00234335         0         -   14.8838
    36      16        37        210.00227825         0         -   14.8838
    37      16        37        210.00221668         0         -   14.8838
    38      16        37        210.00215834         0         -   14.8838
    39      16        37        21  0.002103         0         -   14.8838
2012-10-16 16:10:38.557419min lat: 0.258984 max lat: 23.8748 avg lat: 14.8838
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    40      16        37        210.00205043         0         -   14.8838
    41      16        37        210.00200041         0         -   14.8838
    42      16        37        210.00195279         0         -   14.8838
    43      16        37        210.00190737         0         -   14.8838
    44      16        37        210.00186402         0         -   14.8838
    45      16        37        21 0.0018226         0         -   14.8838
    46      16        37        210.00178298         0         -   14.8838
    47      16        37        210.00174504         0         -   14.8838
    48      16        37        210.00170869         0         -   14.8838
    49      16        37        210.00167382         0         -   14.8838
    50      16        37        210.00164034         0         -   14.8838
    51      16        37        210.00160818         0         -   14.8838
    52      16        37        210.00157725         0         -   14.8838
    53      16        37        210.00154749         0         -   14.8838
    54      16        37        210.00151884         0         -   14.8838
    55      16        37        210.00149122         0         -   14.8838
    56      16        37        210.00146459         0         -   14.8838
    57      16        37        21 0.0014389         0         -   14.8838
    58      16        37        210.00141409         0         -   14.8838
    59      16        37        210.00139012         0         -   14.8838
2012-10-16 16:10:58.560825min lat: 0.258984 max lat: 23.8748 avg lat: 14.8838
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    60      16        37        210.00136695         0         -   14.8838
    61      16        37        210.00134454         0         -   14.8838
    62      16        37        210.00132286         0         -   14.8838
    63      16        37        210.00130186         0         -   14.8838
    64      16        37        210.00128152         0         -   14.8838
    65      16        37        21 0.0012618         0         -   14.8838
    66       6        38        320.001893610.00104802    44.154   25.6537
    67       6        38        320.00186535         0         -   25.6537
 Total time run:         67.779471
Total writes made:      38
Write size:             4096
Bandwidth (MB/sec):     0.002

Stddev Bandwidth:       0.00555568
Max bandwidth (MB/sec): 0.0429688
Min bandwidth (MB/sec): 0
Average Latency:        27.8774
Stddev Latency:         17.9306
Max latency:            64.4662
Min latency:            0.258984

root@cc[staging]:/var/images# rados -p todo-list-test bench 60 write
-b 4096 -t 16
 Maintaining 16 concurrent writes of 4096 bytes for at least 60 seconds.
 Object prefix: benchmark_data_cc_20637
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16       129       113  0.441308  0.441406  0.089088  0.130034
     2      16       228       212  0.413976  0.386719  0.170227  0.144586
     3      16       324       308  0.400965     0.375   0.17551  0.149879
     4      16       481       465  0.454019  0.613281  0.081618  0.136798
     5      16       628       612  0.478042  0.574219  0.080181  0.124768
     6      16       641       625  0.406831 0.0507812  0.707781  0.136926
     7      16       641       625  0.348713         0         -  0.136926
     8      16       659       643  0.313911 0.0351562  0.281386  0.193741
     9      16       739       723  0.313749    0.3125  0.177852  0.196558
    10      16       865       849  0.331585  0.492188  0.142906   0.18791
    11      16      1009       993  0.352569    0.5625  0.089554  0.176182
    12      16      1059      1043  0.339462  0.195312  0.132217  0.174339
    13      16      1075      1059  0.318157    0.0625   1.57543  0.195923
    14      16      1075      1059  0.295431         0         -  0.195923
    15      16      1076      1060  0.2759960.00195312   2.07952    0.1977
    16      16      1156      1140  0.278275    0.3125  0.158087   0.22184
    17      16      1204      1188  0.272933    0.1875   0.08167  0.223639
    18      16      1221      1205  0.261459 0.0664062  0.714995  0.236254
    19      16      1222      1206  0.2479040.00390625  0.293113  0.236301
2012-10-16 16:14:21.879463min lat: 0.066844 max lat: 2.96921 avg lat: 0.247091
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    20      16      1302      1286  0.251131    0.3125   0.19978  0.247091
    21      16      1317      1301  0.241961 0.0585938  0.203456  0.246584
    22      16      1399      1383   0.24552  0.320312  0.158703  0.253264
    23      16      1511      1495  0.253864    0.4375  0.164127  0.245372
    24      16      1654      1638  0.266557  0.558594  0.086673  0.234202
    25      16      1718      1702  0.265894      0.25   0.27639  0.230337
    26      16      1720      1704  0.255967 0.0078125   1.58549  0.231933
    27      16      1785      1769  0.255889  0.253906  0.140973  0.238474
    28      16      1816      1800  0.251075  0.121094  0.179986  0.247877
    29      16      1913      1897   0.25548  0.378906  0.103972  0.243542
    30      16      2057      2041  0.265712    0.5625  0.071993  0.234536
    31      16      2233      2217  0.279314    0.6875  0.090504  0.223128
    32      16      2264      2248  0.274369  0.121094   0.11962  0.221455
    33      16      2281      2265  0.268067 0.0664062   0.22729  0.232648
    34      16      2281      2265  0.260183         0         -  0.232648
    35      16      2298      2282  0.254646 0.0332031  0.227998  0.244199
    36      16      2425      2409  0.261351  0.496094  0.116612  0.238956
    37      16      2553      2537  0.267798       0.5  0.126228  0.232921
    38      16      2652      2636  0.270926  0.386719   0.12253  0.229696
    39      16      2665      2649  0.265281 0.0507812  0.205365  0.229567
2012-10-16 16:14:41.882716min lat: 0.066844 max lat: 2.96921 avg lat: 0.234852
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    40      16      2730      2714  0.264996  0.253906  0.116825  0.234852
    41      16      2730      2714  0.258532         0         -  0.234852
    42      16      2747      2731  0.253958 0.0332031  0.155447  0.245442
    43      16      2844      2828  0.256862  0.378906  0.159537  0.242571
    44      16      2892      2876  0.255285    0.1875  0.548109  0.242033
    45      16      2955      2939   0.25508  0.246094  0.245377  0.243047
    46      16      2957      2941  0.249705 0.0078125   1.33743  0.243768
    47      16      3021      3005   0.24971      0.25  0.226326  0.249331
    48      16      3021      3005  0.244508         0         -  0.249331
    49      16      3067      3051  0.243184 0.0898438  0.218276  0.255766
    50      16      3068      3052  0.2383990.00390625   1.28465  0.256103
    51      16      3068      3052  0.233724         0         -  0.256103
    52      16      3085      3069  0.230506 0.0332031   0.78895   0.26712
    53      16      3181      3165  0.233232     0.375  0.128619  0.267592
    54      16      3244      3228  0.233469  0.246094  0.162226  0.265363
    55      16      3245      3229  0.2292950.00390625   1.23061  0.265662
    56      16      3326      3310   0.23085  0.316406  0.130218  0.270394
    57      16      3421      3405  0.233309  0.371094  0.271192   0.26772
    58      16      3422      3406  0.2293540.00390625  0.243273  0.267713
    59      16      3472      3456  0.228777  0.195312  0.172423  0.272543
2012-10-16 16:15:01.885897min lat: 0.066844 max lat: 2.96921 avg lat: 0.271267
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    60      16      3536      3520   0.22913      0.25  0.245948  0.271267
    61      16      3536      3520  0.225373         0         -  0.271267
 Total time run:         61.778350
Total writes made:      3537
Write size:             4096
Bandwidth (MB/sec):     0.224

Stddev Bandwidth:       0.199931
Max bandwidth (MB/sec): 0.6875
Min bandwidth (MB/sec): 0
Average Latency:        0.278853
Stddev Latency:         0.441319
Max latency:            2.96921
Min latency:            0.066844

The biggest problem in my clusters is poor performance. Simple insert
to the database on the client often takes seconds. It is unacceptable
and I am trying to find the bottleneck. Could you please help me with
it?

Replica count is set to 2. Kernel 3.2.23, ceph 0.52, osd data are
stored on the same partition (two 7200rpm disks in raid0) as journal
files, filesystem btrfs. Filesystem on n11c1 is 4 months old, n12c1
and n14c1 are around 1.5 month old, cc and n3cc is around 3 months
old. Both clusters are healthy. Production have around 544 pgs and 46
pools, staging 232 and 8 pools.

-- 
Regards
Maciej Galkiewicz
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux