Re: poor OSD performance using kernel 3.4

Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> · Tue, 29 May 2012 15:39:52 +0200

On 29/05/2012 11:46, Stefan Priebe - Profihost AG wrote:
It would be really nice if somebody from inktank can comment this whole
sitation.

Hello.
I think I have the same bug :

My setup is with 8 OSD nodes, 3 MDS (1 active) & 3 MON.
All my machines are debian, using a custom 3.4.0 kernel. Ceph is 
0.47.2-1~bpo60+1 (debian package)

root@label5:~#  rados -p data bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    0       0         0         0         0         0         -         0
    1      16        99        83     331.9       332  0.059756 0.0946512
    2      16       141       125   249.946       168  0.049822  0.212338
    3      16       166       150   199.963       100  0.057352  0.257179
    4      16       227       211   210.965       244  0.043592  0.265005
    5      16       257       241   192.767       120  0.040883  0.276718
    6      16       260       244   162.641        12   1.59593  0.293439
    7      16       319       303   173.118       236  0.056913  0.357856
    8      16       348       332   165.976       116  0.052954  0.332424
    9      16       348       332   147.535         0         -  0.332424
   10      16       472       456   182.374       248  0.038543  0.343745
   11      16       485       469   170.522        52  0.040475  0.347328
   12      16       485       469   156.312         0         -  0.347328
   13      16       517       501   154.133        64  0.047759  0.378595
   14      16       562       546    155.98       180  0.042814  0.395036
   15      16       563       547   145.847         4  0.045834  0.394398
   16      16       563       547   136.732         0         -  0.394398
   17      16       563       547   128.689         0         -  0.394398
   18      16       667       651   144.648   138.667   0.06501  0.440847
   19      16       703       687   144.613       144  0.040772  0.421935
min lat: 0.030505 max lat: 5.05834 avg lat: 0.421935
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   20      16       703       687   137.382         0         -  0.421935
   21      16       704       688   131.031         2   2.65675  0.425184
   22      14       704       690   125.439         8   3.26857  0.433417
Total time run:        22.042041
Total writes made:     704
Write size:            4194304
Bandwidth (MB/sec):    127.756

Average Latency:       0.498932
Max latency:           5.05834
Min latency:           0.030505

What puzzle me is if I test with pool rbd instead :

root@label5:~#  rados -p rbd bench 20 write -t 16
Maintaining 16 concurrent writes of 4194304 bytes for at least 20 seconds.
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    0       0         0         0         0         0         -         0
    1      16       191       175   699.782       700  0.236737 0.0841979
    2      16       397       381   761.837       824  0.065643 0.0813094
    3      16       602       586   781.193       820   0.07921 0.0808584
    4      16       815       799    798.88       852  0.066597 0.0785906
    5      16      1026      1010   807.885       844   0.10364 0.0785475
    6      16      1249      1233   821.886       892  0.069324 0.0773951
    7      16      1461      1445   825.608       848  0.053176 0.0770628
    8      16      1680      1664   831.895       876   0.09612 0.0765263
    9      16      1897      1881   835.891       868  0.100736 0.0761617
   10      16      2105      2089   835.491       832  0.114913 0.0761897
   11      16      2329      2313   840.983       896  0.042009 0.0758589
   12      16      2553      2537   845.559       896   0.07017 0.0754364
   13      16      2786      2770   852.203       932  0.066365 0.0749136
   14      16      3009      2993   855.041       892   0.06491 0.0746046
   15      16      3228      3212   856.431       876   0.05698 0.0745573
   16      16      3437      3421   855.148       836  0.062162 0.0746339
   17      16      3652      3636   855.428       860  0.140451  0.074534
   18      16      3878      3862   858.121       904  0.081505 0.0743125
   19      16      4106      4090   860.952       912  0.079922 0.0742146
min lat: 0.032342 max lat: 0.63151 avg lat: 0.0741575
  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   20      16      4324      4308   861.495       872   0.06199 0.0741575
Total time run:        20.102264
Total writes made:     4325
Write size:            4194304
Bandwidth (MB/sec):    860.600

Average Latency:       0.0743131
Max latency:           0.63151
Min latency:           0.032342

As you can see, much more stable bandwith with this pool.

I understand data & rbd pool probably don't use the same internals, but 
is this difference expected ?

disclaimer: By no mean I'm a ceph expert, I'm just experimenting with 
it, and still don't understand all the internals.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html