This is resolved: When strapped for memory, the VM fragments the reads/writes. I bypassed the VM by adding O_DIRECT in the backed_file_open call in backed_file_open (in bs_rdwr.c)... and the performance went up as expected. Chris On Tue, Sep 23, 2008 at 7:59 AM, Chris Worley <worleys@xxxxxxxxx> wrote: > On Tue, Sep 23, 2008 at 2:44 AM, FUJITA Tomonori > <fujita.tomonori@xxxxxxxxxxxxx> wrote: >> On Mon, 22 Sep 2008 10:21:32 -0600 >> "Chris Worley" <worleys@xxxxxxxxx> wrote: >> >>> On Fri, Sep 19, 2008 at 7:40 PM, Chris Worley <worleys@xxxxxxxxx> wrote: >>> > >>> > I'm running CentOS 5.2 targets w/ a 2.6.24 kernel. The initiator is >>> > Win2003. On the initiator side, the fs is formated NTFS w/ a 4K block >>> > size (and the NTFS block size seems to have nothing to do w/ this >>> > issue). >>> > >>> > Watching iostat on the target side, everything is being written to the >>> > underlying disk in 512 byte operations. >>> > >>> > Best I can tell, it's the Linux side that's fragmenting the I/O. >>> > >>> > I could get a lot better performance if these were coalesced into >>> > larger, variable, block sizes (i.e. what's being written from the >>> > initiator side is much larger blocks). >>> > >>> > Is there something tgtd queries on the disk to get this information? >> >> tgtd doesn't do anything special. It opens a file on your file system >> (or a device file such as /dev/sda) and performs read/write system >> calls. >> >> >>> > I don't see an fstat64 use of st_blksize in the source. >>> > >>> > I can put a dummy md "linear" device atop the disk and set the MD >>> > device's chunk size to 4K... then everything to the MD device (as well >>> > as to the underlying disk) is passed in 4K blocks... which performs >>> > much better (except even larger blocks would get better performance if >>> > the user is writing larger blocks... and smaller blocks do a >>> > read-modify-write that causes 3x the IO activity to perform). >>> >>> I changed the MD to chunk at 8K blocks (and the NTFS on the w2003 >>> side to use 8k blocks), and the tgtd was still chunking at 4K blocks. >>> >>> Does anybody have an idea where the fragmenting is occurring and/or >>> how to stop it? >> >> Not sure, but I think that the problem looks more generic one, not >> specific to tgtd, right? >> > > You're right in that it looks like tgtd gets a WRITE_10 command in > bs_rdwr_request (in bs_rdwr.c), and calls pwrite64 w/ 64K lengths... > and gets 64K returns. > > So, I'm not sure whose fragmenting the I/O operation... but it is > getting fragmented. > > But, iostat shows those are getting chunked somewhere, i.e., the md > device shows 4K chunking (where [read-B/s+write-B/s]/tps == 4096): > > Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn > md0 16621.20 10.55 54.38 52 271 > > ... I've looked into the pwrite64 kernel syscall, and it calls > vfs_write (in linux/fs/read_write.c) which could take two paths: 1) > do_sync_write (which won't chunk the call) or 2) a callback in the > pointer file->f_op->write. I don't have a clue who might fill in the > callback, or what might be getting called there. > > I've not ever seen problems w/ pwrite/pread... so I'm very perplexed > as to why this is getting fragmented. > > Chris > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html