Re: iostat shows all tgt I/O in 512 byte operations... how to coalesce?

"Chris Worley" <worleys@xxxxxxxxx> · Tue, 23 Sep 2008 07:59:24 -0600

On Tue, Sep 23, 2008 at 2:44 AM, FUJITA Tomonori
<fujita.tomonori@xxxxxxxxxxxxx> wrote:
> On Mon, 22 Sep 2008 10:21:32 -0600
> "Chris Worley" <worleys@xxxxxxxxx> wrote:
>
>> On Fri, Sep 19, 2008 at 7:40 PM, Chris Worley <worleys@xxxxxxxxx> wrote:
>> >
>> > I'm running CentOS 5.2 targets w/ a 2.6.24 kernel.  The initiator is
>> > Win2003.  On the initiator side, the fs is formated NTFS w/ a 4K block
>> > size (and the NTFS block size seems to have nothing to do w/ this
>> > issue).
>> >
>> > Watching iostat on the target side, everything is being written to the
>> > underlying disk in 512 byte operations.
>> >
>> > Best I can tell, it's the Linux side that's fragmenting the I/O.
>> >
>> > I could get a lot better performance if these were coalesced into
>> > larger, variable, block sizes (i.e. what's being written from the
>> > initiator side is much larger blocks).
>> >
>> > Is there something tgtd queries on the disk to get this information?
>
> tgtd doesn't do anything special. It opens a file on your file system
> (or a device file such as /dev/sda) and performs read/write system
> calls.
>
>
>> > I don't see an fstat64 use of st_blksize in the source.
>> >
>> > I can put a dummy md "linear" device atop the disk and set the MD
>> > device's chunk size to 4K... then everything to the MD device (as well
>> > as to the underlying disk) is passed in 4K blocks... which performs
>> > much better (except even larger blocks would get better performance if
>> > the user is writing larger blocks... and smaller blocks do a
>> > read-modify-write that causes 3x the IO activity to perform).
>>
>> I changed the MD to chunk at 8K blocks (and the NTFS on the  w2003
>> side to use 8k blocks), and the tgtd was still chunking at 4K blocks.
>>
>> Does anybody have an idea where the fragmenting is occurring and/or
>> how to stop it?
>
> Not sure, but I think that the problem looks more generic one, not
> specific to tgtd, right?
>

You're right in that it looks like tgtd gets a WRITE_10 command in
bs_rdwr_request (in bs_rdwr.c), and calls pwrite64 w/ 64K lengths...
and gets 64K returns.

So, I'm not sure whose fragmenting the I/O operation... but it is
getting fragmented.

But, iostat shows those are getting chunked somewhere, i.e., the md
device shows 4K chunking (where [read-B/s+write-B/s]/tps == 4096):

Device:            tps      MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
md0           16621.20        10.55        54.38               52            271

... I've looked into the pwrite64 kernel syscall, and it calls
vfs_write (in linux/fs/read_write.c) which could take two paths: 1)
do_sync_write (which won't chunk the call) or 2) a callback in the
pointer file->f_op->write.  I don't have a clue who might fill in the
callback, or what might be getting called there.

I've not ever seen problems w/ pwrite/pread... so I'm very perplexed
as to why this is getting fragmented.

Chris
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html