On Thu, Mar 20, 2008 at 6:20 AM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 19 Mar 2008 18:08:07 -0700 Jeremy Higdon <jeremy@xxxxxxx> wrote: > > > > (cc's added. It matters) > > > > On Thu, Mar 20, 2008 at 10:16:54AM +1100, David Chinner wrote: > > > 4p ia64, 24GB RAM, 2.6.25-rc3, qla1280, 15krpm scsi disk. > > > > > > Direct I/O: > > > > > > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=1024 oflag=direct > > > 1024+0 records in > > > 1024+0 records out > > > 1073741824 bytes (1.1 GB) copied, 27.8974 s, 38.5 MB/s > > > > > > Doing approximately 80 512k I/os per second (disk bandwidth). > > > > > > Buffered I/O: > > > > > > dgc@budgie:~/xfstests$ sudo dd if=/dev/zero of=/dev/sdb6 bs=1024k count=4096 > > > 4096+0 records in > > > 4096+0 records out > > > 4294967296 bytes (4.3 GB) copied, 427.872 s, 10.0 MB/s > > > > How big is sdb6? How many '2's do you see in > > > > factor `cat /sys/block/sdb/sdb6/size` > > There have always been problems with thsi and I'm not sure that anyone > cared enough about buffered writes to blockdevs to get to the bottom of > them. > > I assume you aren't running i386 highmem... I've experienced the same kind of degradation with buffered IO vs direct specifically when using Linux partitions. Using the full block device doesn't create such fragmented IOs. The problem was reported to the blktrace list some weeks ago by my coworker (cc'ing Ming): http://marc.info/?l=linux-btrace&m=120296070516776&w=2 (fyi, Ming forgot to use oflag=sync, this explains the weird results when doing buffered writes while blktrace'ing) To summarize a little more (without messing round with partition alignment), the test system is x86_64 with 4GB, storage is directly connected via aacraid, 7200 rpm SATA disk. Using: dd if=/dev/zero of=/dev/sdhX bs=1M oflag=sync count=4 seek=2 and dd if=/dev/zero of=/dev/sdhX bs=1M oflag=direct count=4 seek=2 full disk case (sdh): buffered writes are +8 and being merged to 3 512k requests, 1 8k and 1 504k (27MB/s) odirect writes are all +512 (35MB/s) partitioned case: a 3GB sdh1 and ~720GB sdh2. buffered writes to partition1 are +1 and are merged to 65k requests (10.3MB/s) buffered writes to partition2 are +2 and are merged to 130k requests (15.2MB/s) odirect writes to either partition are all +512 (27MB/s) So it appears partition size matters (at least in this case)? As you can see performing buffered writes to a partition resulted in very small requests, much like David reported in his original post (+1 or +2 via blktrace). This happens with every kernel tried; 2.6.22, 2.6.24, RHEL5U1, etc. cfq vs deadline doesn't change anything. For partitions, changing partition alignment to a power of 2 actually hurt!? Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html