On 1/16/14, 5:21 PM, Dave Chinner wrote: > On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote: >> On 1/15/14, 4:38 PM, Dave Chinner wrote: >>> On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote: >>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c >>>> index 33ad9a7..1f3431f 100644 >>>> --- a/fs/xfs/xfs_ioctl.c >>>> +++ b/fs/xfs/xfs_ioctl.c >>>> @@ -1587,7 +1587,12 @@ xfs_file_ioctl( >>>> XFS_IS_REALTIME_INODE(ip) ? >>>> mp->m_rtdev_targp : mp->m_ddev_targp; >>>> >>>> - da.d_mem = da.d_miniosz = 1 << target->bt_sshift; >>>> + /* >>>> + * Report device physical sector size as "optimal" minimum, >>>> + * unless blocksize is smaller than that. >>>> + */ >>>> + da.d_miniosz = min(target->bt_pssize, target->bt_bsize); >>> >>> Just grab the filesysetm block size from the xfs_mount: >>> >>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize); >>> >>>> + da.d_mem = da.d_miniosz; >>> >>> I'd suggest that this should be PAGE_SIZE - it's for memory buffer >>> alignment, not IO alignment, so using the IO alignment just seems >>> wrong to me... >> >> Ok. Was just sticking close to what we had before. >> >> So: >> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize); >> da.d_mem = PAGE_SIZE; >> >> ? Then we can have a minimum IO size of 512, and a memory alignment of >> 4k, isn't that a little odd? >> >> (IOWs we could do 512-aligned memory before, right? What's the downside, >> or the value in changing it now?) > > We can do arbitrary byte aligned buffers if I understand > get_user_pages() correctly - it just maps the page under the buffer > into the kernel address space and then the bio is pointed at it. > AFAICT, the reason for the "memory buffer needs 512 byte alignment" > is simply that this is the minimum IO size supported. Actually, it's fs/direct-io.c which enforces this (not sure why I couldn't find that before), in do_blockdev_direct_IO(); the *enforced* minimum memory alignment is the size of the bev's logical block size. > However, for large IOs, 512 byte alignment is not really optimal. If > we don't align the buffer to PAGE_SIZE, then we have partial head > and tail pages, so for a 512k IO we need to map 129 pages into a bio > instead of 128. When you have hardware that can only handle 128 > segments in a DMA transfer, that means the 512k IO needs to be sent > in two IOs rather than one. Ok, but I have a bit of a problem with changing what XFS_IOC_DIOINFO reports. (I had originally thought to change the minimum IO size, but I have talked myself out of that, too). The xfsctl(3) manpage says that XFS_IOC_DIOINFO: "Get(s) information required to perform direct I/O on the specified file descriptor." and "the user’s data buffer must conform to the same type of constraints as required for accessing a raw disk partition." IOWs, the ioctl is documented as returning minimum, not optimal, requirements, and it has always been implemented as such. Changing its meaning now seems wrong. At least, I would not like to do so as part of this functional change; to do so would probably be best in a new ioctl which reports both minimum & optimal sizes. And at that point we should just do a vfs interface. :) Of course, applications don't have to use the minimum sizes reported by the ioctl; they are free to be smarter and do larger sizes and alignments. But if the ioctl was designed and documented to report required *minimums*, I think we should leave it as such. I'm going to resend the change, split up a bit more to separate cleanups from functional changes, and maybe we can discuss the ioctl change as a potential additional patch. Thanks, -Eric > There's quite a bit of hardware out there that have a limit of 128 > segments to each IO.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs