Re: [patch] open.2: Fix up incorrect O_DIRECT aligment information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Jeff,
>
> On Thu, Mar 28, 2013 at 6:28 PM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>> Hi,
>>
>> The alignment text for O_DIRECT is slightly misleading:
>>
>>        Under  Linux  2.4, transfer sizes, and the alignment of the user buffer
>>        and the file offset must all be multiples of the logical block size  of
>>        the  file  system.   Under  Linux 2.6, alignment to 512-byte boundaries
>>        suffices.
>>
>> The last sentence is incorrect.  You cannot perform O_DIRECT I/O in
>> sizes smaller than the underlying logical block size (for block devices
>> and block-based file systems; nfs is a different beast).  I've attached
>> my proposed change.  Comments and word-smithing are most welcome.
>
> Could you provide some detail on how you determined, verified, or
> tested this detail? I'm not saying you're wrong, but that text has
> been there a long time (since before I was maintainer), in the days
> when there was no VCS and changelog. It's even possible that I sent
> the patch to Andries that added this piece, since I have an idea that
> I tested this detail on Linux 2.6 at some point in the distant past.
> Indeed, I just now did a (very) quick test using an old program that
> suggest that you can do O_DIRECT I/O with block sizes smaller than the
> LBS.

Details on the test?

Devices do not accept I/O that is smaller than a single addressable
logical block.  Thus, you cannot perform direct I/O in sizes smaller
than that (there is no read-modify-write in O_DIRECT).  The code that
enforces the limitation is in fs/direct-io.c:

static inline ssize_t
do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
        struct block_device *bdev, const struct iovec *iov, loff_t
        offset,
        unsigned long nr_segs, get_block_t get_block, dio_iodone_t
        end_io,
        dio_submit_t submit_io, int flags)
{
...
        unsigned i_blkbits = ACCESS_ONCE(inode->i_blkbits);
        unsigned blkbits = i_blkbits;
        unsigned blocksize_mask = (1 << blkbits) - 1;
        ssize_t retval = -EINVAL;
...
        if (offset & blocksize_mask) { // offset is not a multiple of the file system block size
                if (bdev)
                        blkbits = blksize_bits(bdev_logical_block_size(bdev));
                blocksize_mask = (1 << blkbits) - 1;
                if (offset & blocksize_mask)
                        goto out; // offset is not a multiple of the logical block size
        }

	// and the check is done for the address and length of each segment:

       /* Check the memory alignment.  Blocks cannot straddle pages */
        for (seg = 0; seg < nr_segs; seg++) {
                addr = (unsigned long)iov[seg].iov_base;
                size = iov[seg].iov_len;
                end += size;
                if (unlikely((addr & blocksize_mask) ||
                             (size & blocksize_mask))) {
                        if (bdev)
                                blkbits = blksize_bits(
                                         bdev_logical_block_size(bdev));
                        blocksize_mask = (1 << blkbits) - 1;
                        if ((addr & blocksize_mask) || (size & blocksize_mask))
                                goto out;
                }
        }

...
out:
	return retval;
}

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux