inode size is 256. Pretty stuck with these settings and ext4. I missed the memo that Gluster started to prefer xfs, back in the 2.x days xfs was not the preferred filesystem. At this point it's a 340TB filesystem with 160TB used. I just added more space, and was doing some followup testing and wasn't impressed with the results. But I am sure I was happier before with the performance. Still running CentOS 5.8 Anything else I could look at? Thanks, Tom On Mar 7, 2013, at 5:04 PM, Bryan Whitehead <driver at megahappy.net> wrote: > I'm sure you know, but xfs is the recommended filesystem for glusterfs. Ext4 has a number of issues. (Particularly on CentOS/Redhat6). > > The default inode size for ext4 (and xfs) is small for the number of extended attributes glusterfs uses. This causes a minor hit in performance on xfs if the extended attributes grow more than 265 (xfs default size). In xfs, this is fixed by setting the size of an inode to 512. How big the impact is on ext4 is something I don't know offhand. But looking at a couple of boxes I have it looks like some ext4 filesystems have 128 inode size and some have 256 inode size (both of which are too small for glusterfs). The performance hit is everytime extended attributes need to be read several inodes need to be seeked and found. > > run "dumpe2fs -h <blockdevice> | grep size" on your ext4 mountpoints. > > If it is not too much of a bother - I'd try xfs as your filesystem for the bricks > > mkfs.xfs -i size=512 <blockdevice> > > Please see this for more detailed info: > https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Storage/2.0/html-single/Administration_Guide/index.html#chap-User_Guide-Setting_Volumes > > > On Thu, Mar 7, 2013 at 12:08 PM, Thomas Wakefield <twake at cola.iges.org> wrote: > Everything is built as ext4, no options other than lazy_itable_init=1 when I built the filesystems. > > Server mount example: > LABEL=disk2a /storage/disk2a ext4 defaults 0 0 > > Client mount: > fs-disk2:/shared /shared glusterfs defaults 0 0 > > Remember, the slow reads are only from gluster clients, the disks are really fast when I am local on the server testing the disks. > > > -Tom > > > > > On Mar 7, 2013, at 1:09 PM, Bryan Whitehead <driver at megahappy.net> wrote: > >> Was just thinking, what is your mount options for your bricks (using inode64?)? Also, you are using xfs... right? >> >> When you created the filesystems did you allocate more inode space? -i size=512 ? >> >> >> On Thu, Mar 7, 2013 at 5:49 AM, Thomas Wakefield <twake at cola.iges.org> wrote: >> Still looking for help. >> >> >> On Mar 4, 2013, at 7:43 AM, Thomas Wakefield <twake at iges.org> wrote: >> >>> Also, I tested an NFS mount over the same 10GB link, and was able to pull almost 200MB/s. But Gluster is still much slower. Also I tested running it for a longer test, 105GB of data, and still showed that writing is MUCH faster. Which makes no sense when the disks can read 2x as fast as they can write. >>> >>> Any other thoughts? >>> >>> [root at cpu_crew1 ~]# dd if=/dev/zero of=/shared/working/benchmark/test.cpucrew1 bs=512k count=200000 ; dd if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=512k >>> 200000+0 records in >>> 200000+0 records out >>> 104857600000 bytes (105 GB) copied, 159.135 seconds, 659 MB/s >>> 200000+0 records in >>> 200000+0 records out >>> 104857600000 bytes (105 GB) copied, 1916.87 seconds, 54.7 MB/s >>> >>> >>> On Mar 1, 2013, at 9:58 AM, Thomas Wakefield <twake at iges.org> wrote: >>> >>>> The max setting for performance.read-ahead-page-count is 16, which I did just try. No significant change. >>>> >>>> Any other setting options? >>>> >>>> >>>> >>>> On Feb 28, 2013, at 10:18 PM, Anand Avati <anand.avati at gmail.com> wrote: >>>> >>>>> Can you try "gluster volume set <volname> performance.read-ahead-page-count 64" or some value higher or lower? >>>>> >>>>> Avati >>>>> >>>>> On Thu, Feb 28, 2013 at 7:15 PM, Thomas Wakefield <twake at iges.org> wrote: >>>>> Good point, forgot to set a blcoksize, here are the redone dd tests: >>>>> >>>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=128k >>>>> 40000+0 records in >>>>> 40000+0 records out >>>>> 5242880000 bytes (5.2 GB) copied, 65.4928 seconds, 80.1 MB/s >>>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=1M >>>>> 5000+0 records in >>>>> 5000+0 records out >>>>> 5242880000 bytes (5.2 GB) copied, 49.0907 seconds, 107 MB/s >>>>> [root at cpu_crew1 ~]# dd if=/shared/working/benchmark/test.cpucrew1 of=/dev/null bs=4M >>>>> 1250+0 records in >>>>> 1250+0 records out >>>>> 5242880000 bytes (5.2 GB) copied, 44.5724 seconds, 118 MB/s >>>>> >>>>> Still not impressive. >>>>> >>>>> -Tom >>>>> >>>>> >>>>> On Feb 28, 2013, at 8:42 PM, Jeff Anderson-Lee <jonah at eecs.berkeley.edu> wrote: >>>>> >>>>>> Thomas, >>>>>> >>>>>> You have not specified a block size, so you are doing a huge number of small(ish) reads with associated round trips. What happens with dd bs=128k ..? >>>>>> >>>>>> Jeff Anderson-Lee >>>>>> >>>>>> On 2/28/2013 5:30 PM, Thomas Wakefield wrote: >>>>>>> Did a fresh dd test just to confirm, same results: >>>>>>> >>>>>>> [root at cpu_crew1 benchmark]# dd if=/dev/zero of=/shared/working/benchmark/test.cpucrew1 bs=512k count=10000 >>>>>>> 10000+0 records in >>>>>>> 10000+0 records out >>>>>>> 5242880000 bytes (5.2 GB) copied, 7.43695 seconds, 705 MB/s >>>>>>> [root at cpu_crew1 benchmark]# dd if=/shared/working/benchmark/test.cpucrew1 of=/dev/null >>>>>>> 552126+0 records in >>>>>>> 552125+0 records out >>>>>>> 282688000 bytes (283 MB) copied, 37.8514 seconds, 7.5 MB/s >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Feb 28, 2013, at 8:14 PM, Bryan Whitehead <driver at megahappy.net> wrote: >>>>>>> >>>>>>>> How are you doing the reading? Is this still an iozone benchmark? >>>>>>>> >>>>>>>> if you simply dd if=/glustermount/bigfile of=/dev/null, is the speed better? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 28, 2013 at 5:05 PM, Thomas Wakefield <twake at iges.org> wrote: >>>>>>>> I get great speed locally, it's only when I add gluster in that it slows down. I get 2GB/s locally to the exact same brick. It's gluster that is having the read issue (80MB/s). But Gluster can write just fine, 800MB/s. >>>>>>>> >>>>>>>> The blockdev idea is a good one, and I have already done it. Thanks though. >>>>>>>> >>>>>>>> -Tom >>>>>>>> >>>>>>>> On Feb 28, 2013, at 7:53 PM, Ling Ho <ling at slac.stanford.edu> wrote: >>>>>>>> >>>>>>>>> Tom, >>>>>>>>> >>>>>>>>> What type of disks do you have? If they are raid 5 or 6, have you try setting the read-ahead size to 8192 or 16384 (blockdev --setra 8192 /dev/<sd?> ? >>>>>>>>> >>>>>>>>> ... >>>>>>>>> ling >>>>>>>>> >>>>>>>>> On 02/28/2013 04:23 PM, Thomas Wakefield wrote: >>>>>>>>>> >>>>>>>>>> Did anyone else have any ideas on performance tuning for reads? >>>>>>>>>> >>>>>>>>>> On Feb 27, 2013, at 9:29 PM, Thomas Wakefield <twake at iges.org> wrote: >>>>>>>>>> >>>>>>>>>>> Bryan- >>>>>>>>>>> >>>>>>>>>>> Yes I can write at 700-800MBytes/sec, but i can only read at 70-80 MBytes/sec. I would be very happy if I could get it to read at the same speed it can write at. And the 70-80 is sequential, not random for reads, same exact test commands on the disk server are in the 2+GB/s range, so I know the disk server can do it. >>>>>>>>>>> >>>>>>>>>>> -Tom >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Feb 27, 2013, at 7:41 PM, Bryan Whitehead <driver at megahappy.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> Are your figures 700-800MByte/sec? Because that is probably as fast as your 10G nic cards are able to do. You can test that by trying to push a large amount of data over nc or ftp. >>>>>>>>>>>> >>>>>>>>>>>> Might want to try Infiniband. 40G cards are pretty routine. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 27, 2013 at 3:45 PM, Thomas Wakefield <twake at iges.org> wrote: >>>>>>>>>>>> I also get the same performance running iozone for large file sizes, iozone -u 1 -r 512k -s 2G -I -F. >>>>>>>>>>>> >>>>>>>>>>>> Large file IO is what I need the system to do. I am just shocked at the huge difference between local IO and gluster client IO. I know there should be some difference, but 10x is unacceptable. >>>>>>>>>>>> >>>>>>>>>>>> -Tom >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Feb 27, 2013, at 5:31 PM, Bryan Whitehead <driver at megahappy.net> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Every time you open/close a file or a directory you will have to wait for locks which take time. This is totally expected. >>>>>>>>>>>>> >>>>>>>>>>>>> Why don't you share what you want to do? iozone benchmarks look like crap but serving qcow2 files to qemu works fantastic for me. What are you doing? Make a benchmark that does that. If you are going to have many files with a wide variety of sizes glusterfs/fuse might not be what you are looking for. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Feb 27, 2013 at 12:56 PM, Thomas Wakefield <twake at cola.iges.org> wrote: >>>>>>>>>>>>> I have tested everything, small and large files. I have used file sizes ranging from 128k up to multiple GB files. All the reads are bad. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Here is a fairly exhaustive iozone auto test: >>>>>>>>>>>>> >>>>>>>>>>>>> random random bkwd record stride >>>>>>>>>>>>> KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread >>>>>>>>>>>>> 64 4 40222 63492 26868 30060 1620 71037 1572 70570 31294 77096 72475 14736 13928 >>>>>>>>>>>>> 64 8 99207 116366 13591 13513 3214 97690 3155 109978 28920 152018 158480 18936 17625 >>>>>>>>>>>>> 64 16 230257 253766 25156 28713 10867 223732 8873 244297 54796 303383 312204 15062 13545 >>>>>>>>>>>>> 64 32 255943 234481 5735102 7100397 11897 318502 13681 347801 24214 695778 528618 25838 28094 >>>>>>>>>>>>> 64 64 214096 681644 6421025 7100397 27453 292156 28117 621657 27338 376062 512471 28569 32534 >>>>>>>>>>>>> 128 4 74329 75468 26428 41089 1131 72857 1118 66976 1597 73778 78343 13351 13026 >>>>>>>>>>>>> 128 8 100862 135170 24966 16734 2617 118966 2560 120406 39156 125121 146613 16177 16180 >>>>>>>>>>>>> 128 16 115114 253983 28212 17854 5307 246180 5431 229843 47335 255920 271173 27256 24445 >>>>>>>>>>>>> 128 32 256042 391360 39848 64258 11329 290230 9905 429563 38176 490380 463696 20917 19219 >>>>>>>>>>>>> 128 64 248573 592699 4557257 6812590 19583 452366 29263 603357 42967 814915 692017 76327 37604 >>>>>>>>>>>>> 128 128 921183 526444 5603747 5379161 45614 390222 65441 826202 41384 662962 1040839 78526 39023 >>>>>>>>>>>>> 256 4 76212 77337 40295 32125 1289 71866 1261 64645 1436 57309 53048 23073 29550 >>>>>>>>>>>>> 256 8 126922 141976 26237 25130 2566 128058 2565 138981 2985 125060 133603 22840 24955 >>>>>>>>>>>>> 256 16 242883 263636 41850 24371 4902 250009 5290 248792 89353 243821 247303 26965 26199 >>>>>>>>>>>>> 256 32 409074 439732 40101 39335 11953 436870 11209 430218 83743 409542 479390 30821 27750 >>>>>>>>>>>>> 256 64 259935 571502 64840 71847 22537 617161 23383 392047 91852 672010 802614 41673 53111 >>>>>>>>>>>>> 256 128 847597 812329 185517 83198 49383 708831 44668 794889 74267 1180188 1662639 54303 41018 >>>>>>>>>>>>> 256 256 481324 709299 5217259 5320671 44668 719277 40954 808050 41302 790209 771473 62224 35754 >>>>>>>>>>>>> 512 4 77667 75226 35102 29696 1337 66262 1451 67680 1413 69265 69142 42084 27897 >>>>>>>>>>>>> 512 8 134311 144341 30144 24646 2102 134143 2209 134699 2296 108110 128616 25104 29123 >>>>>>>>>>>>> 512 16 200085 248787 30235 25697 4196 247240 4179 256116 4768 250003 226436 32351 28455 >>>>>>>>>>>>> 512 32 330341 439805 26440 39284 8744 457611 8006 424168 125953 425935 448813 27660 26951 >>>>>>>>>>>>> 512 64 483906 733729 48747 41121 16032 555938 17424 587256 187343 366977 735740 41700 41548 >>>>>>>>>>>>> 512 128 836636 907717 69359 94921 42443 761031 36828 964378 123165 651383 695697 58368 44459 >>>>>>>>>>>>> 512 256 520879 860437 145534 135523 40267 847532 31585 663252 69696 1270846 1492545 48822 48092 >>>>>>>>>>>>> 512 512 782951 973118 3099691 2942541 42328 871966 46218 911184 49791 953248 1036527 52723 48347 >>>>>>>>>>>>> 1024 4 76218 69362 36431 28711 1137 66171 1174 68938 1125 70566 70845 34942 28914 >>>>>>>>>>>>> 1024 8 126045 140524 37836 15664 2698 126000 2557 125566 2567 110858 127255 26764 27945 >>>>>>>>>>>>> 1024 16 243398 261429 40238 23263 3987 246400 3882 260746 4093 236652 236874 31429 25076 >>>>>>>>>>>>> 1024 32 383109 422076 41731 41605 8277 473441 7775 415261 8588 394765 407306 40089 28537 >>>>>>>>>>>>> 1024 64 590145 619156 39623 53267 15051 722717 14624 753000 257294 597784 620946 38619 44073 >>>>>>>>>>>>> 1024 128 1077836 1124099 56192 64916 36851 1102176 37198 1082454 281548 829175 792604 47975 51913 >>>>>>>>>>>>> 1024 256 941918 1074331 72783 81450 26778 1099636 32395 1060013 183218 1024121 995171 44371 45448 >>>>>>>>>>>>> 1024 512 697483 1130312 100324 114682 48215 1041758 41480 1058967 90156 994020 1563622 56328 46370 >>>>>>>>>>>>> 1024 1024 931702 1087111 4609294 4199201 44191 949834 45594 970656 56674 933525 1075676 44876 46115 >>>>>>>>>>>>> 2048 4 71438 67066 58319 38913 1147 44147 1043 42916 967 66416 67205 45953 96750 >>>>>>>>>>>>> 2048 8 141926 134567 61101 55445 2596 77528 2564 80402 4258 124211 120747 53888 100337 >>>>>>>>>>>>> 2048 16 254344 255585 71550 74500 5410 139365 5201 141484 5171 205521 213113 67048 57304 >>>>>>>>>>>>> 2048 32 397833 411261 56676 80027 10440 260034 10126 230238 10814 391665 383379 79333 60877 >>>>>>>>>>>>> 2048 64 595167 687205 64262 87327 20772 456430 19960 477064 23190 540220 563096 86812 92565 >>>>>>>>>>>>> 2048 128 833585 933403 121926 118621 37700 690020 37575 733254 567449 712337 734006 92011 104934 >>>>>>>>>>>>> 2048 256 799003 949499 143688 125659 40871 892757 37977 880494 458281 836263 901375 131332 110237 >>>>>>>>>>>>> 2048 512 979936 1040724 120896 138013 54381 859783 48721 780491 279203 1068824 1087085 97886 98078 >>>>>>>>>>>>> 2048 1024 901754 987938 53352 53043 72727 1054522 68269 992275 181253 1309480 1524983 121600 95585 >>>>>>>>>>>>> 2048 2048 831890 1021540 4257067 3302797 75672 984203 80181 826209 94278 966920 1027159 111832 105921 >>>>>>>>>>>>> 4096 4 66195 67316 62171 74785 1328 28963 1329 26397 1223 71470 69317 55903 84915 >>>>>>>>>>>>> 4096 8 122221 120057 90537 60958 2598 47312 2468 59783 2640 128674 127872 41285 40422 >>>>>>>>>>>>> 4096 16 238321 239251 29336 32121 4153 89262 3986 96930 4608 229970 237108 55039 56983 >>>>>>>>>>>>> 4096 32 417110 421356 30974 50000 8382 156676 7886 153841 7900 359585 367288 26611 25952 >>>>>>>>>>>>> 4096 64 648008 668066 32193 29389 14830 273265 14822 282211 19653 581898 620798 51281 50218 >>>>>>>>>>>>> 4096 128 779422 848564 55594 60253 37108 451296 35908 491361 37567 738163 728059 67681 66440 >>>>>>>>>>>>> 4096 256 865623 886986 71368 63947 44255 645961 42689 719491 736707 819696 837641 57059 60347 >>>>>>>>>>>>> 4096 512 852099 889650 68870 73891 31185 845224 30259 830153 392334 910442 961983 60083 55558 >>>>>>>>>>>>> 4096 1024 710357 867810 29377 29522 49954 846640 43665 926298 213677 986226 1115445 55130 59205 >>>>>>>>>>>>> 4096 2048 826479 908420 43191 42075 59684 904022 58601 855664 115105 1418322 1524415 60548 66066 >>>>>>>>>>>>> 4096 4096 793351 855111 3232454 3673419 66018 861413 48833 847852 45914 852268 842075 42980 48374 >>>>>>>>>>>>> 8192 4 67340 69421 42198 31740 994 23251 1166 16813 837 73827 73126 25169 29610 >>>>>>>>>>>>> 8192 8 137150 125622 29131 36439 2051 44342 1988 48930 2315 134183 135367 31080 33573 >>>>>>>>>>>>> 8192 16 237366 220826 24810 26584 3576 88004 3769 78717 4289 233751 235355 23302 28742 >>>>>>>>>>>>> 8192 32 457447 454404 31594 27750 8141 142022 7846 143984 9322 353147 396188 34203 33265 >>>>>>>>>>>>> 8192 64 670645 655259 28630 23255 16669 237476 16965 244968 15607 590365 575320 49998 43305 >>>>>>>>>>>>> 8192 128 658676 760982 44197 47802 28693 379523 26614 378328 27184 720997 702038 51707 49733 >>>>>>>>>>>>> 8192 256 643370 698683 56233 63165 28846 543952 27745 576739 44014 701007 725534 59611 58985 >>>>>>>>>>>>> 8192 512 696884 776793 67258 52705 18711 698854 21004 694124 621695 784812 773331 43101 47659 >>>>>>>>>>>>> 8192 1024 729664 810451 15470 15875 31318 801490 38123 812944 301222 804323 832765 54308 53376 >>>>>>>>>>>>> 8192 2048 749217 68757 21914 22667 48971 783309 48132 782738 172848 907408 929324 51156 50565 >>>>>>>>>>>>> 8192 4096 707677 763960 32063 31928 47809 751692 49560 786339 93445 1046761 1297876 48037 51680 >>>>>>>>>>>>> 8192 8192 623817 746288 2815955 3137358 48722 741633 35428 753787 49626 803683 823800 48977 52895 >>>>>>>>>>>>> 16384 4 72372 73651 34471 30788 960 23610 903 22316 891 71445 71138 56451 55129 >>>>>>>>>>>>> 16384 8 137920 141704 50830 33857 1935 41934 2275 35588 3608 130757 137801 51621 48525 >>>>>>>>>>>>> 16384 16 245369 242460 41808 29770 3605 75682 4355 75315 4767 241100 239693 53263 30785 >>>>>>>>>>>>> 16384 32 448877 433956 31846 35010 7973 118181 8819 112703 8177 381734 391651 57749 63417 >>>>>>>>>>>>> 16384 64 710831 700712 66792 68864 20176 209806 19034 207852 21255 589503 601379 104567 105162 >>>>>>>>>>>>> 16384 128 836901 860867 104226 100373 40899 358865 40946 360562 39415 675968 691538 96086 105695 >>>>>>>>>>>>> 16384 256 798081 828146 107103 120433 39084 595325 39050 593110 56925 763466 797859 109645 113414 >>>>>>>>>>>>> 16384 512 810851 843931 113564 106202 35111 714831 46244 745947 53636 802902 760172 110492 100879 >>>>>>>>>>>>> 16384 1024 726399 820219 22106 22987 53087 749053 54781 777705 1075341 772686 809723 100349 96619 >>>>>>>>>>>>> 16384 2048 807772 856458 23920 23617 66320 829576 72105 740848 656379 864539 835446 93499 101714 >>>>>>>>>>>>> 16384 4096 797470 840596 27270 > ... > > [Message clipped] > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130307/100799e3/attachment-0001.html>