Slow read performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm sure you know, but xfs is the recommended filesystem for glusterfs.
Ext4 has a number of issues. (Particularly on CentOS/Redhat6).

The default inode size for ext4 (and xfs) is small for the number of extended
attributes glusterfs uses. This causes a minor hit in performance on xfs if
the extended attributes grow more than 265 (xfs default size). In xfs, this
is fixed by setting the size of an inode to 512. How big the impact is on
ext4 is something I don't know offhand. But looking at a couple of boxes I
have it looks like some ext4 filesystems have 128 inode size and some have
256 inode size (both of which are too small for glusterfs). The performance
hit is everytime extended attributes need to be read several inodes need to
be seeked and found.

run "dumpe2fs -h <blockdevice> | grep size" on your ext4 mountpoints.

If it is not too much of a bother - I'd try xfs as your filesystem for the
bricks

mkfs.xfs -i size=512 <blockdevice>

Please see this for more detailed info:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Storage/2.0/html-single/Administration_Guide/index.html#chap-User_Guide-Setting_Volumes


On Thu, Mar 7, 2013 at 12:08 PM, Thomas Wakefield <twake at cola.iges.org>wrote:

> Everything is built as ext4, no options other than lazy_itable_init=1 when
> I built the filesystems.
>
> Server mount example:
> LABEL=disk2a /storage/disk2a ext4 defaults 0 0
>
> Client mount:
> fs-disk2:/shared /shared glusterfs defaults 0 0
>
> Remember, the slow reads are only from gluster clients, the disks are
> really fast when I am local on the server testing the disks.
>
>
> -Tom
>
>
>
>
> On Mar 7, 2013, at 1:09 PM, Bryan Whitehead <driver at megahappy.net> wrote:
>
> Was just thinking, what is your mount options for your bricks (using
> inode64?)? Also, you are using xfs... right?
>
> When you created the filesystems did you allocate more inode space? -i
> size=512 ?
>
>
> On Thu, Mar 7, 2013 at 5:49 AM, Thomas Wakefield <twake at cola.iges.org>wrote:
>
>> Still looking for help.
>>
>>
>> On Mar 4, 2013, at 7:43 AM, Thomas Wakefield <twake at iges.org> wrote:
>>
>> Also, I tested an NFS mount over the same 10GB link, and was able to pull
>> almost 200MB/s.  But Gluster is still much slower.  Also I tested running
>> it for a longer test, 105GB of data, and still showed that writing is MUCH
>> faster.  Which makes no sense when the disks can read 2x as fast as they
>> can write.
>>
>> Any other thoughts?
>>
>> [root at cpu_crew1 ~]# dd if=/dev/zero
>> of=/shared/working/benchmark/test.cpucrew1 bs=512k count=200000 ; dd
>> if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=512k
>> 200000+0 records in
>> 200000+0 records out
>> 104857600000 bytes (105 GB) copied, 159.135 seconds, 659 MB/s
>> 200000+0 records in
>> 200000+0 records out
>> 104857600000 bytes (105 GB) copied, 1916.87 seconds, 54.7 MB/s
>>
>>
>> On Mar 1, 2013, at 9:58 AM, Thomas Wakefield <twake at iges.org> wrote:
>>
>> The max setting for performance.read-ahead-page-count is 16, which I did
>> just try.  No significant change.
>>
>> Any other setting options?
>>
>>
>>
>> On Feb 28, 2013, at 10:18 PM, Anand Avati <anand.avati at gmail.com> wrote:
>>
>> Can you try "gluster volume set <volname>
>> performance.read-ahead-page-count 64" or some value higher or lower?
>>
>> Avati
>>
>> On Thu, Feb 28, 2013 at 7:15 PM, Thomas Wakefield <twake at iges.org> wrote:
>>
>>> Good point, forgot to set a blcoksize, here are the redone dd tests:
>>>
>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1
>>> of=/dev/null bs=128k
>>> 40000+0 records in
>>> 40000+0 records out
>>> 5242880000 bytes (5.2 GB) copied, 65.4928 seconds, 80.1 MB/s
>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1
>>> of=/dev/null bs=1M
>>> 5000+0 records in
>>> 5000+0 records out
>>> 5242880000 bytes (5.2 GB) copied, 49.0907 seconds, 107 MB/s
>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1
>>> of=/dev/null bs=4M
>>> 1250+0 records in
>>> 1250+0 records out
>>> 5242880000 bytes (5.2 GB) copied, 44.5724 seconds, 118 MB/s
>>>
>>> Still not impressive.
>>>
>>> -Tom
>>>
>>>
>>> On Feb 28, 2013, at 8:42 PM, Jeff Anderson-Lee <jonah at eecs.berkeley.edu>
>>> wrote:
>>>
>>>  Thomas,
>>>
>>> You have not specified a block size, so you are doing a huge number of
>>> small(ish) reads with associated round trips. What happens with dd bs=128k
>>> ..?
>>>
>>> Jeff Anderson-Lee
>>>
>>> On 2/28/2013 5:30 PM, Thomas Wakefield wrote:
>>>
>>> Did a fresh dd test just to confirm, same results:
>>>
>>>  [root at cpu_crew1 benchmark]# dd if=/dev/zero
>>> of=/shared/working/benchmark/test.cpucrew1 bs=512k count=10000
>>> 10000+0 records in
>>> 10000+0 records out
>>> 5242880000 bytes (5.2 GB) copied, 7.43695 seconds, 705 MB/s
>>> [root at cpu_crew1 benchmark]# dd
>>> if=/shared/working/benchmark/test.cpucrew1 of=/dev/null
>>> 552126+0 records in
>>> 552125+0 records out
>>> 282688000 bytes (283 MB) copied, 37.8514 seconds, 7.5 MB/s
>>>
>>>
>>>
>>>  On Feb 28, 2013, at 8:14 PM, Bryan Whitehead <driver at megahappy.net>
>>> wrote:
>>>
>>>  How are you doing the reading? Is this still an iozone benchmark?
>>>
>>>  if you simply dd if=/glustermount/bigfile of=/dev/null, is the speed
>>> better?
>>>
>>>
>>> On Thu, Feb 28, 2013 at 5:05 PM, Thomas Wakefield <twake at iges.org>wrote:
>>>
>>>> I get great speed locally, it's only when I add gluster in that it
>>>> slows down.  I get 2GB/s locally to the exact same brick.  It's gluster
>>>> that is having the read issue (80MB/s).  But Gluster can write just fine,
>>>> 800MB/s.
>>>>
>>>>  The blockdev idea is a good one, and I have already done it.  Thanks
>>>> though.
>>>>
>>>>  -Tom
>>>>
>>>>   On Feb 28, 2013, at 7:53 PM, Ling Ho <ling at slac.stanford.edu> wrote:
>>>>
>>>>  Tom,
>>>>
>>>> What type of disks do you have? If they are raid 5 or 6, have you try
>>>> setting the read-ahead size to 8192 or 16384 (blockdev --setra 8192
>>>> /dev/<sd?>   ?
>>>>
>>>> ...
>>>> ling
>>>>
>>>> On 02/28/2013 04:23 PM, Thomas Wakefield wrote:
>>>>
>>>> Did anyone else have any ideas on performance tuning for reads?
>>>>
>>>>  On Feb 27, 2013, at 9:29 PM, Thomas Wakefield <twake at iges.org> wrote:
>>>>
>>>>  Bryan-
>>>>
>>>>  Yes I can write at 700-800MBytes/sec, but i can only read at *70-80
>>>> MBytes/sec*.  I would be very happy if I could get it to read at the
>>>> same speed it can write at.  And the 70-80 is sequential, not random for
>>>> reads, same exact test commands on the disk server are in the 2+GB/s range,
>>>> so I know the disk server can do it.
>>>>
>>>>  -Tom
>>>>
>>>>
>>>>  On Feb 27, 2013, at 7:41 PM, Bryan Whitehead <driver at megahappy.net>
>>>> wrote:
>>>>
>>>>  Are your figures 700-800M*Byte*/sec? Because that is probably as fast
>>>> as your 10G nic cards are able to do. You can test that by trying to push a
>>>> large amount of data over nc or ftp.
>>>>
>>>>  Might want to try Infiniband. 40G cards are pretty routine.
>>>>
>>>>
>>>> On Wed, Feb 27, 2013 at 3:45 PM, Thomas Wakefield <twake at iges.org>wrote:
>>>>
>>>>> I also get the same performance running iozone for large file
>>>>> sizes, iozone -u 1 -r 512k -s 2G -I -F.
>>>>>
>>>>>  Large file IO is what I need the system to do.  I am just shocked at
>>>>> the huge difference between local IO and gluster client IO.  I know there
>>>>> should be some difference, but 10x is unacceptable.
>>>>>
>>>>>  -Tom
>>>>>
>>>>>
>>>>>
>>>>>  On Feb 27, 2013, at 5:31 PM, Bryan Whitehead <driver at megahappy.net>
>>>>> wrote:
>>>>>
>>>>>  Every time you open/close a file or a directory you will have to
>>>>> wait for locks which take time. This is totally expected.
>>>>>
>>>>>  Why don't you share what you want to do? iozone benchmarks look like
>>>>> crap but serving qcow2 files to qemu works fantastic for me. What are you
>>>>> doing? Make a benchmark that does that. If you are going to have many files
>>>>> with a wide variety of sizes glusterfs/fuse might not be what you are
>>>>> looking for.
>>>>>
>>>>>
>>>>> On Wed, Feb 27, 2013 at 12:56 PM, Thomas Wakefield <
>>>>> twake at cola.iges.org> wrote:
>>>>>
>>>>>> I have tested everything, small and large files.  I have used file
>>>>>> sizes ranging from 128k up to multiple GB files.  All the reads are bad.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Here is a fairly exhaustive iozone auto test:
>>>>>>
>>>>>>                                                              random
>>>>>>  random    bkwd   record   stride
>>>>>>               KB  reclen   write rewrite    read    reread    read
>>>>>> write    read  rewrite     read   fwrite frewrite   fread  freread
>>>>>>               64       4   40222   63492    26868    30060    1620
>>>>>> 71037    1572    70570    31294    77096    72475   14736    13928
>>>>>>               64       8   99207  116366    13591    13513    3214
>>>>>> 97690    3155   109978    28920   152018   158480   18936    17625
>>>>>>               64      16  230257  253766    25156    28713   10867
>>>>>>  223732    8873   244297    54796   303383   312204   15062    13545
>>>>>>               64      32  255943  234481  5735102  7100397   11897
>>>>>>  318502   13681   347801    24214   695778   528618   25838    28094
>>>>>>               64      64  214096  681644  6421025  7100397   27453
>>>>>>  292156   28117   621657    27338   376062   512471   28569    32534
>>>>>>              128       4   74329   75468    26428    41089    1131
>>>>>> 72857    1118    66976     1597    73778    78343   13351    13026
>>>>>>              128       8  100862  135170    24966<135170%20%C2%A0%20%C2%A024966>   16734    2617  118966    2560   120406    39156   125121   146613
>>>>>> 16177    16180
>>>>>>              128      16  115114  253983    28212    17854    5307
>>>>>>  246180    5431   229843    47335   255920   271173   27256    24445
>>>>>>              128      32  256042  391360    39848    64258   11329
>>>>>>  290230    9905   429563    38176   490380   463696   20917    19219
>>>>>>              128      64  248573  592699  4557257  6812590   19583
>>>>>>  452366   29263   603357    42967   814915   692017   76327    37604
>>>>>>              128     128  921183  526444  5603747  5379161   45614
>>>>>>  390222   65441   826202    41384   662962  1040839   78526    39023
>>>>>>              256       4   76212   77337    40295    32125    1289
>>>>>> 71866    1261    64645     1436    57309    53048   23073    29550
>>>>>>              256       8  126922  141976    26237<141976%20%C2%A0%20%C2%A026237>   25130    2566  128058    2565
>>>>>>   138981     2985   125060   133603   22840    24955
>>>>>>              256      16  242883  263636    41850    24371    4902
>>>>>>  250009    5290   248792    89353   243821   247303   26965    26199
>>>>>>              256      32  409074  439732    40101    39335   11953
>>>>>>  436870   11209   430218    83743   409542   479390   30821    27750
>>>>>>              256      64  259935  571502    64840    71847   22537<71847%20%C2%A0%2022537> 617161   23383   392047    91852   672010   802614   41673    53111
>>>>>>              256     128  847597  812329   185517    83198   49383<83198%20%C2%A0%2049383> 708831   44668   794889    74267  1180188  1662639   54303    41018
>>>>>>              256     256  481324  709299  5217259  5320671   44668
>>>>>>  719277   40954   808050    41302   790209   771473   62224    35754
>>>>>>              512       4   77667   75226    35102    29696    1337
>>>>>> 66262    1451    67680     1413    69265    69142   42084    27897
>>>>>>              512       8  134311  144341    30144<144341%20%C2%A0%20%C2%A030144>   24646    2102  134143    2209   134699     2296   108110   128616
>>>>>> 25104    29123
>>>>>>              512      16  200085  248787    30235    25697    4196
>>>>>>  247240    4179   256116     4768   250003   226436   32351    28455
>>>>>>              512      32  330341  439805    26440    39284    8744
>>>>>>  457611    8006   424168   125953   425935   448813   27660    26951
>>>>>>              512      64  483906  733729    48747    41121   16032
>>>>>>  555938   17424   587256   187343   366977   735740   41700    41548
>>>>>>              512     128  836636  907717    69359    94921   42443<94921%20%C2%A0%2042443> 761031   36828   964378   123165   651383   695697   58368    44459
>>>>>>              512     256  520879  860437   145534   135523   40267
>>>>>>  847532   31585   663252    69696  1270846  1492545   48822    48092
>>>>>>              512     512  782951  973118  3099691  2942541   42328
>>>>>>  871966   46218   911184    49791   953248  1036527   52723    48347
>>>>>>             1024       4   76218   69362    36431    28711    1137
>>>>>> 66171    1174    68938     1125    70566    70845   34942    28914
>>>>>>             1024       8  126045  140524    37836<140524%20%C2%A0%20%C2%A037836>   15664    2698  126000    2557   125566     2567   110858   127255
>>>>>> 26764    27945
>>>>>>             1024      16  243398  261429    40238    23263    3987
>>>>>>  246400    3882   260746     4093   236652   236874   31429    25076
>>>>>>             1024      32  383109  422076    41731    41605    8277
>>>>>>  473441    7775   415261     8588   394765   407306   40089    28537
>>>>>>             1024      64  590145  619156    39623    53267   15051
>>>>>>  722717   14624   753000   257294   597784   620946   38619    44073
>>>>>>             1024     128 1077836 1124099    56192<1124099%20%C2%A0%20%C2%A056192>   64916   36851 1102176   37198  1082454   281548   829175   792604
>>>>>> 47975    51913
>>>>>>             1024     256  941918 1074331    72783    81450   26778<81450%20%C2%A0%2026778>1099636   32395  1060013   183218  1024121   995171   44371    45448
>>>>>>             1024     512  697483 1130312   100324   114682   48215
>>>>>> 1041758   41480  1058967    90156   994020  1563622   56328    46370
>>>>>>             1024    1024  931702 1087111  4609294  4199201   44191
>>>>>>  949834   45594   970656    56674   933525  1075676   44876    46115
>>>>>>             2048       4   71438   67066    58319    38913    1147
>>>>>> 44147    1043    42916      967    66416    67205   45953    96750
>>>>>>             2048       8  141926  134567    61101    55445    2596
>>>>>> 77528    2564    80402     4258   124211   120747   53888   100337
>>>>>>             2048      16  254344  255585    71550    74500    5410
>>>>>>  139365    5201   141484     5171   205521   213113   67048    57304
>>>>>>             2048      32  397833  411261    56676    80027   10440<80027%20%C2%A0%2010440> 260034   10126   230238    10814   391665   383379   79333    60877
>>>>>>             2048      64  595167  687205    64262    87327   20772 456430   19960   477064    23190   540220   563096   86812    92565
>>>>>>             2048     128  833585  933403   121926   118621   37700
>>>>>>  690020   37575   733254   567449   712337   734006   92011   104934
>>>>>>             2048     256  799003  949499   143688   125659   40871 892757   37977   880494   458281   836263   901375  131332   110237
>>>>>>             2048     512  979936 1040724   120896   138013   54381
>>>>>>  859783   48721   780491   279203  1068824  1087085   97886    98078
>>>>>>             2048    1024  901754  987938    53352    53043   727271054522   68269   992275   181253  1309480  1524983  121600    95585
>>>>>>             2048    2048  831890 1021540  4257067  3302797   75672
>>>>>>  984203   80181   826209    94278   966920  1027159  111832   105921
>>>>>>             4096       4   66195   67316    62171    74785    1328
>>>>>> 28963    1329    26397     1223    71470    69317   55903    84915
>>>>>>             4096       8  122221  120057    90537    60958    2598
>>>>>> 47312    2468    59783     2640   128674   127872   41285    40422
>>>>>>             4096      16  238321  239251    29336    32121    4153
>>>>>> 89262    3986    96930     4608   229970   237108   55039    56983
>>>>>>             4096      32  417110  421356    30974    50000    8382
>>>>>>  156676    7886   153841     7900   359585   367288   26611    25952
>>>>>>             4096      64  648008  668066    32193    29389   14830
>>>>>>  273265   14822   282211    19653   581898   620798   51281    50218
>>>>>>             4096     128  779422  848564    55594    60253   37108 451296   35908   491361    37567   738163   728059   67681    66440
>>>>>>             4096     256  865623  886986    71368    63947   44255 645961   42689   719491   736707   819696   837641   57059    60347
>>>>>>             4096     512  852099  889650    68870    73891   31185
>>>>>>  845224   30259   830153   392334   910442   961983   60083    55558
>>>>>>             4096    1024  710357  867810    29377    29522   49954
>>>>>>  846640   43665   926298   213677   986226  1115445   55130    59205
>>>>>>             4096    2048  826479  908420    43191    42075   59684
>>>>>>  904022   58601   855664   115105  1418322  1524415   60548    66066
>>>>>>             4096    4096  793351  855111  3232454  3673419   66018
>>>>>>  861413   48833   847852    45914   852268   842075   42980    48374
>>>>>>             8192       4   67340   69421    42198    31740     994
>>>>>> 23251    1166    16813      837    73827    73126   25169    29610
>>>>>>             8192       8  137150  125622    29131    36439    2051
>>>>>> 44342    1988    48930     2315   134183   135367   31080    33573
>>>>>>             8192      16  237366  220826    24810    26584    3576
>>>>>> 88004    3769    78717     4289   233751   235355   23302    28742
>>>>>>             8192      32  457447  454404    31594    27750    8141
>>>>>>  142022    7846   143984     9322   353147   396188   34203    33265
>>>>>>             8192      64  670645  655259    28630    23255   16669
>>>>>>  237476   16965   244968    15607   590365   575320   49998    43305
>>>>>>             8192     128  658676  760982    44197    47802   28693
>>>>>>  379523   26614   378328    27184   720997   702038   51707    49733
>>>>>>             8192     256  643370  698683    56233    63165   28846 543952   27745   576739    44014   701007   725534   59611    58985
>>>>>>             8192     512  696884  776793    67258    52705   18711
>>>>>>  698854   21004   694124   621695   784812   773331   43101    47659
>>>>>>             8192    1024  729664  810451   15470    15875   31318
>>>>>>  801490   38123   812944   301222   804323   832765   54308    53376
>>>>>>             8192    2048  749217   68757    21914    22667   48971
>>>>>>  783309   48132   782738   172848   907408   929324   51156    50565
>>>>>>             8192    4096  707677  763960    32063    31928   47809 751692   49560   786339    93445  1046761  1297876   48037
>>>>>>    51680
>>>>>>             8192    8192  623817  746288  2815955  3137358   48722
>>>>>>  741633   35428   753787    49626   803683   823800   48977    52895
>>>>>>            16384       4   72372   73651    34471    30788     960
>>>>>> 23610     903    22316      891    71445    71138   56451    55129
>>>>>>            16384       8  137920  141704    50830    33857    1935
>>>>>> 41934    2275    35588     3608   130757   137801   51621    48525
>>>>>>            16384      16  245369  242460    41808    29770    3605
>>>>>> 75682    4355    75315     4767   241100   239693   53263    30785
>>>>>>            16384      32  448877  433956    31846    35010    7973
>>>>>>  118181    8819   112703     8177   381734   391651   57749    63417
>>>>>>            16384      64  710831  700712    66792    68864   20176
>>>>>>  209806   19034   207852    21255   589503   601379  104567   105162
>>>>>>            16384     128  836901  860867   104226   100373   40899
>>>>>>  358865   40946   360562    39415   675968   691538   96086   105695
>>>>>>            16384     256  798081  828146   107103   120433   39084 595325   39050   593110    56925   763466   797859  109645   113414
>>>>>>            16384     512  810851  843931   113564   106202   35111
>>>>>>  714831   46244   745947    53636   802902   760172  110492   100879
>>>>>>            16384    1024  726399  820219    22106    22987   53087 749053   54781   777705  1075341   772686   809723  100349    96619
>>>>>>            16384    2048  807772  856458    23920    23617   66320
>>>>>>  829576   72105   740848   656379   864539   835446   93499   101714
>>>>>>            16384    4096  797470  840596    27270    <28132%20%C2%A0%2088784>
>>>>>>
>>>>> ...
>
> [Message clipped]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130307/73ed633f/attachment-0001.html>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux