On Tue, 8 Jul 2008, tom@xxxxxxxxxxxxx wrote:
Could i pick your brain on XFS and Disk Scheduler for 20 minutes, please? We are building a Media Downloading service and are to optimize the I/O on the hardware (24 Barracuda ES .5TB drives) For now we have only been able to get about 25MB/s out of one drive through the southbridge, into memory, back down the southbridge and out the Ethernet using lighttpd. Using XFS, Anticipatory Scheduler. -Can't get the -o filestreams setting to work. -Not sure how to optimize the "wait" time of the Anticipatory Scheduler or whether we might be better off with CFQ or Deadline. Latency is not an issue at all (up to a second i guess), throughput is. -Does XFS automatically cache the entire directory structure, metadata, etc. in memory or is there a setting I can use to force that? Want to remove those seeks. [cfq]: 21968.59 Kbytes/s [anticipatory]: 25044.67 Kbytes/s [noop]: 24737.24 Kbytes/s [deadline]: 24921.77 Kbytes/s Physical filesize on average is 1GB, which we are making available in 100MB downloads each. Using lighttpd as the front end and want it to fetch from disk in 1MB increments.
Can anyone answer Tom's questions regarding the filestreams option? Also, you need to state the chipset/etc you are using. Not sure what kind of raid controller you are using or motherboard or chipset, those are the most important components before one could even begin to look at your problem. What do you get with ext3? Also 25MiB/s? Sounds like a possible IRQ/conflict/issue or some raid cards need 'noapic' enabled on as a kernel argument that fixes the problems in some cases. Also what type of drives? What type of interconnect are you using? SATAI? SATAII? SAS? Fiber channel? XFS is one of the best and fastest Linux filesystems out there (well, ext4 is on its way to beat it) but for now XFS is king, if you are getting 25MiB/s to one drive it sounds like you need to look for other bottlenecks in your configuration. hdparm -tT /dev/sda hdparm -tT /dev/sdb 750GiB disk: /dev/sda: Timing cached reads: 5274 MB in 2.00 seconds = 2639.40 MB/sec Timing buffered disk reads: 264 MB in 3.00 seconds = 87.95 MB/sec * real disk speed veliciraptor: /dev/sda: Timing cached reads: 6240 MB in 2.00 seconds = 3121.89 MB/sec Timing buffered disk reads: 352 MB in 3.01 seconds = 116.81 MB/sec * real disk speed Read: 3ware performance problems http://forums.storagereview.net/index.php?showtopic=25923 If using a 3ware card, expect to get poor performance. # xfs_info /dev/md3 meta-data=/dev/md3 isize=256 agcount=32, agsize=20603904 blks = sectsz=4096 attr=2 data = bsize=4096 blocks=659324160, imaxpct=5 = sunit=256 swidth=2304 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=0 realtime =none extsz=9437184 blocks=0, rtextents=0 The important units here are sunit and swidth, I suggest reading up on them, if they are both set to 0 that means the striping at the FS level is not distributing the data efficiently across all disks. Regarding XFS caching, it depends on the mount options as to how much is 'cached': I use the following options: (maximum # of logbufs and highest buffer size available) /dev/md3 /r1 xfs defaults,noatime,nodiratime,logbufs=8,logbsize=262144 0 1 man mount for more options As CFQ is now the default I switched over to it, before the proper optimizations, its about the same speed as everything else: http://home.comcast.net/~jpiszcz/20080625/scheduler_comparison.html To speed it up: # Fix slice_idle. # See http://www.nextre.it/oracledocs/ioscheduler_03.html echo "Fixing slice_idle to 0..." for i in $DISKS do echo "Changing slice_idle to 0 on $i" echo 0 > /sys/block/"$i"/queue/iosched/slice_idle done Including xfs@xxxxxxxxxxx & linux-raid@xxxxxxxxxxxxxxxx Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html