On Tue, Sep 23, 2008 at 2:44 AM, FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote: > On Mon, 22 Sep 2008 10:21:32 -0600 > "Chris Worley" <worleys@xxxxxxxxx> wrote: > >> On Fri, Sep 19, 2008 at 7:40 PM, Chris Worley <worleys@xxxxxxxxx> wrote: >> > >> > I'm running CentOS 5.2 targets w/ a 2.6.24 kernel. The initiator is >> > Win2003. On the initiator side, the fs is formated NTFS w/ a 4K block >> > size (and the NTFS block size seems to have nothing to do w/ this >> > issue). >> > >> > Watching iostat on the target side, everything is being written to the >> > underlying disk in 512 byte operations. >> > >> > Best I can tell, it's the Linux side that's fragmenting the I/O. >> > >> > I could get a lot better performance if these were coalesced into >> > larger, variable, block sizes (i.e. what's being written from the >> > initiator side is much larger blocks). >> > >> > Is there something tgtd queries on the disk to get this information? > > tgtd doesn't do anything special. It opens a file on your file system > (or a device file such as /dev/sda) and performs read/write system > calls. > > >> > I don't see an fstat64 use of st_blksize in the source. >> > >> > I can put a dummy md "linear" device atop the disk and set the MD >> > device's chunk size to 4K... then everything to the MD device (as well >> > as to the underlying disk) is passed in 4K blocks... which performs >> > much better (except even larger blocks would get better performance if >> > the user is writing larger blocks... and smaller blocks do a >> > read-modify-write that causes 3x the IO activity to perform). >> >> I changed the MD to chunk at 8K blocks (and the NTFS on the w2003 >> side to use 8k blocks), and the tgtd was still chunking at 4K blocks. >> >> Does anybody have an idea where the fragmenting is occurring and/or >> how to stop it? > > Not sure, but I think that the problem looks more generic one, not > specific to tgtd, right? > You're right in that it looks like tgtd gets a WRITE_10 command in bs_rdwr_request (in bs_rdwr.c), and calls pwrite64 w/ 64K lengths... and gets 64K returns. So, I'm not sure whose fragmenting the I/O operation... but it is getting fragmented. But, iostat shows those are getting chunked somewhere, i.e., the md device shows 4K chunking (where [read-B/s+write-B/s]/tps == 4096): Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn md0 16621.20 10.55 54.38 52 271 ... I've looked into the pwrite64 kernel syscall, and it calls vfs_write (in linux/fs/read_write.c) which could take two paths: 1) do_sync_write (which won't chunk the call) or 2) a callback in the pointer file->f_op->write. I don't have a clue who might fill in the callback, or what might be getting called there. I've not ever seen problems w/ pwrite/pread... so I'm very perplexed as to why this is getting fragmented. Chris -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html