Hello everyone! Recently I have had the pleasure of working with some nice hardware and the displeasure of seeing it fail commercially. However, when trying to optimize performance I noticed that in some cases the bottlenecks were not in the hardware or my driver, but rather in the filesystem on top of it. So maybe all this may still be useful in improving said filesystem. Hardware is basically a fast SSD. Performance tops out at about 650MB/s and is fairly insensitive to random access behaviour. Latency is about 50us for 512B reads and near 0 for writes, through the usual cheating. Numbers below were created with sysbench, using directIO. Each block is a matrix with results for blocksizes from 512B to 16384B and thread count from 1 to 128. Four blocks for reads and writes, both sequential and random. Ext4: ===== seqrd 1 2 4 8 16 32 64 128 16384 4867 8717 16367 29249 39131 39140 39135 39123 8192 6324 10889 19980 37239 66346 78444 78429 78409 4096 9158 15810 26072 45999 85371 148061 157222 157294 2048 15019 24555 35934 59698 106541 198986 313969 315566 1024 24271 36914 51845 80230 136313 252832 454153 484120 512 37803 62144 78952 111489 177844 314896 559295 615744 rndrd 1 2 4 8 16 32 64 128 16384 4770 8539 14715 23465 33630 39073 39101 39103 8192 6138 11398 20873 35785 56068 75743 78364 78374 4096 8338 15657 29648 53927 91854 136595 157279 157349 2048 11985 22894 43495 81211 148029 239962 314183 315695 1024 16529 31306 61307 114721 222700 387439 561810 632719 512 20580 40160 73642 135711 294583 542828 795607 821025 seqwr 1 2 4 8 16 32 64 128 16384 37588 37600 37730 37680 37631 37664 37670 37662 8192 77621 77737 77947 77967 77875 77939 77833 77574 4096 124083 123171 121159 120947 120202 120315 119917 120236 2048 158540 153993 151128 150663 150686 151159 150358 147827 1024 183300 176091 170527 170919 169608 169900 169662 168622 512 229167 231672 221629 220416 223490 217877 222390 219718 rndwr 1 2 4 8 16 32 64 128 16384 38932 38290 38200 38306 38421 38404 38329 38326 8192 79790 77297 77464 77447 77420 77460 77495 77545 4096 163985 157626 158232 158212 158102 158169 158273 158236 2048 272261 322637 320032 320932 321597 322008 322242 322699 1024 339647 609192 652655 644903 654604 658292 659119 659667 512 403366 718426 1227643 1149850 1155541 1157633 1173567 1180710 Sequestial writes are significantly worse than random writes. If someone is interested, I can see which lock is causing all this. Sequential reads below 2k are also worse, although one might wonder whether direct IO on 1k chunks makes sense at all. Random reads in the last column scale very nicely with block size down to 1k, but hit some problem at 512B. The machine could be cpu-bound at this point. Btrfs: ====== seqrd 1 2 4 8 16 32 64 128 16384 3270 6582 12919 24866 36424 39682 39726 39721 8192 4394 8348 16483 32165 54221 79256 79396 79415 4096 6337 12024 21696 40569 74924 131763 158292 158763 2048 297222 298299 294727 294740 296496 298517 300118 300740 1024 583891 595083 584272 580965 584030 589115 599634 598054 512 1103026 1175523 1134172 1133606 1123684 1123978 1156758 1130354 rndrd 1 2 4 8 16 32 64 128 16384 3252 6621 12437 20354 30896 39365 39115 39746 8192 4273 8749 17871 32135 51812 72715 79443 79456 4096 5842 11900 24824 48072 84485 128721 158631 158812 2048 7177 12540 20244 27543 32386 34839 35728 35916 1024 7178 12577 20341 27473 32656 34763 36056 35960 512 7176 12554 20289 27603 32504 34781 35983 35919 seqwr 1 2 4 8 16 32 64 128 16384 13357 12838 12604 12596 12588 12641 12716 12814 8192 21426 20471 20090 20097 20287 20236 20445 20528 4096 30740 29187 28528 28525 28576 28580 28883 29258 2048 2949 3214 3360 3431 3440 3498 3396 3498 1024 2167 2205 2412 2376 2473 2221 2410 2420 512 1888 1876 1926 1981 1935 1938 1957 1976 rndwr 1 2 4 8 16 32 64 128 16384 10985 19312 27430 27813 28157 28528 28308 28234 8192 16505 29420 35329 34925 36020 34976 35897 35174 4096 21894 31724 34106 34799 36119 36608 37571 36274 2048 3637 8031 15225 22599 30882 31966 32567 32427 1024 3704 8121 15219 23670 31784 33156 31469 33547 512 3604 7988 15206 23742 32007 31933 32523 33667 Sequential writes below 4k perform drastically worse. Quite unexpected. Write performance across the board is horrible when compared to ext4. Sequential reads are much better, in particular for <4k cases. I would assume some sort of readahead is happening. Random reads <4k again drop off significantly. xfs: ==== seqrd 1 2 4 8 16 32 64 128 16384 4698 4424 4397 4402 4394 4398 4642 4679 8192 6234 5827 5797 5801 5795 6114 5793 5812 4096 9100 8835 8882 8896 8874 8890 8910 8906 2048 14922 14391 14259 14248 14264 14264 14269 14273 1024 23853 22690 22329 22362 22338 22277 22240 22301 512 37353 33990 33292 33332 33306 33296 33224 33271 rndrd 1 2 4 8 16 32 64 128 16384 4585 8248 14219 22533 32020 38636 39033 39054 8192 6032 11186 20294 34443 53112 71228 78197 78284 4096 8247 15539 29046 52090 86744 125835 154031 157143 2048 11950 22652 42719 79562 140133 218092 286111 314870 1024 16526 31294 59761 112494 207848 348226 483972 574403 512 20635 39755 73010 130992 270648 484406 686190 726615 seqwr 1 2 4 8 16 32 64 128 16384 39956 39695 39971 39913 37042 37538 36591 32179 8192 67934 66073 30963 29038 29852 25210 23983 28272 4096 89250 81417 28671 18685 12917 14870 22643 22237 2048 140272 120588 140665 140012 137516 139183 131330 129684 1024 217473 147899 210350 218526 219867 220120 219758 215166 512 328260 181197 211131 263533 294009 298203 301698 298013 rndwr 1 2 4 8 16 32 64 128 16384 38447 38153 38145 38140 38156 38199 38208 38236 8192 78001 76965 76908 76945 77023 77174 77166 77106 4096 160721 156000 157196 157084 157078 157123 156978 157149 2048 325395 317148 317858 318442 318750 318981 319798 320393 1024 434084 649814 650176 651820 653928 654223 655650 655818 512 501067 876555 1290292 1217671 1244399 1267729 1285469 1298522 Sequential reads are pretty horrible. Sequential writes are hitting a hot lock again. So, if anyone would like to improve one of these filesystems and needs more data, feel free to ping me. Jörn -- Victory in war is not repetitious. -- Sun Tzu -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html