Re: O_DIRECT change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage,

We are on Dragon Boat Festival holidays. :) I will take a close look
and test it next week.
--
Henry

2011/6/4 Sage Weil <sage@xxxxxxxxxxxx>:
> Hi Henry,
>
> I made a small change to the O_DIRECT path to zero holes properly in
> commit 85defe7 (below). ÂDo you mind reviewing the change, and/or testing,
> since you are the main O_DIRECT user? ÂThe test case that was failing is
> here:
>
> Â Â Â Âhttp://tracker.newdream.net/issues/1096#note-19
>
> The problem was that the read coming down from the VFS isn't trimmed to
> i_size, so the old zero tail check wasn't true, and we would set
> *checkeof. ÂThen ceph_aio_read would getattr and loop, since we
> didn't actually read eof (due to a hole).
>
> Actually, I suspect the *checkeof part is still incorrect... does the
> zeroing part at least look right to you?
>
> Thanks!
> sage
>
>
> From 85defe76f7e2a0b3d285a3be72fcffce96629b5c Mon Sep 17 00:00:00 2001
> From: Sage Weil <sage@xxxxxxxxxxxx>
> Date: Wed, 1 Jun 2011 16:08:44 -0700
> Subject: [PATCH] ceph: fix short sync reads from the OSD
>
> If we get a short read from the OSD because the object is small, we need to
> zero the remainder of the buffer. ÂFor O_DIRECT reads, the attempted range
> is not trimmed to i_size by the VFS, so we were actually looping
> indefinitely.
>
> Fix by trimming by i_size, and the unconditionally zeroing the trailing
> range.
>
> Reported-by: Jeff Wu <cpwu@xxxxxxxxxxxxx>
> Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx>
> ---
> Âfs/ceph/file.c | Â 28 +++++++++++++++-------------
> Â1 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 8c5ac4e..b654f40 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -283,7 +283,7 @@ int ceph_release(struct inode *inode, struct file *file)
> Âstatic int striped_read(struct inode *inode,
> Â Â Â Â Â Â Â Â Â Â Â Âu64 off, u64 len,
> Â Â Â Â Â Â Â Â Â Â Â Âstruct page **pages, int num_pages,
> - Â Â Â Â Â Â Â Â Â Â Â int *checkeof, bool align_to_pages,
> + Â Â Â Â Â Â Â Â Â Â Â int *checkeof, bool o_direct,
> Â Â Â Â Â Â Â Â Â Â Â Âunsigned long buf_align)
> Â{
> Â Â Â Âstruct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> @@ -308,7 +308,7 @@ static int striped_read(struct inode *inode,
> Â Â Â Âio_align = off & ~PAGE_MASK;
>
> Âmore:
> - Â Â Â if (align_to_pages)
> + Â Â Â if (o_direct)
> Â Â Â Â Â Â Â Âpage_align = (pos - io_align + buf_align) & ~PAGE_MASK;
> Â Â Â Âelse
> Â Â Â Â Â Â Â Âpage_align = pos & ~PAGE_MASK;
> @@ -346,20 +346,22 @@ more:
> Â Â Â Â}
>
> Â Â Â Âif (was_short) {
> - Â Â Â Â Â Â Â /* was original extent fully inside i_size? */
> - Â Â Â Â Â Â Â if (pos + left <= inode->i_size) {
> - Â Â Â Â Â Â Â Â Â Â Â dout("zero tail\n");
> - Â Â Â Â Â Â Â Â Â Â Â ceph_zero_page_vector_range(page_off + read, len - read,
> + Â Â Â Â Â Â Â /* did we bounce off eof? */
> + Â Â Â Â Â Â Â if (pos + left > inode->i_size)
> + Â Â Â Â Â Â Â Â Â Â Â *checkeof = 1;
> +
> + Â Â Â Â Â Â Â /* zero trailing bytes (inside i_size) */
> + Â Â Â Â Â Â Â if (left > 0 && pos < inode->i_size) {
> + Â Â Â Â Â Â Â Â Â Â Â if (pos + left > inode->i_size)
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â left = inode->i_size - pos;
> +
> + Â Â Â Â Â Â Â Â Â Â Â dout("zero tail %d\n", left);
> + Â Â Â Â Â Â Â Â Â Â Â ceph_zero_page_vector_range(page_off + read, left,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âpages);
> - Â Â Â Â Â Â Â Â Â Â Â read = len;
> - Â Â Â Â Â Â Â Â Â Â Â goto out;
> + Â Â Â Â Â Â Â Â Â Â Â read += left;
> Â Â Â Â Â Â Â Â}
> -
> - Â Â Â Â Â Â Â /* check i_size */
> - Â Â Â Â Â Â Â *checkeof = 1;
> Â Â Â Â}
>
> -out:
> Â Â Â Âif (ret >= 0)
> Â Â Â Â Â Â Â Âret = read;
> Â Â Â Âdout("striped_read returns %d\n", ret);
> @@ -659,7 +661,7 @@ out:
>
> Â Â Â Â Â Â Â Â/* hit EOF or hole? */
> Â Â Â Â Â Â Â Âif (statret == 0 && *ppos < inode->i_size) {
> - Â Â Â Â Â Â Â Â Â Â Â dout("aio_read sync_read hit hole, reading more\n");
> + Â Â Â Â Â Â Â Â Â Â Â dout("aio_read sync_read hit hole, ppos %lld < size %lld, reading more\n", *ppos, inode->i_size);
> Â Â Â Â Â Â Â Â Â Â Â Âread += ret;
> Â Â Â Â Â Â Â Â Â Â Â Âbase += ret;
> Â Â Â Â Â Â Â Â Â Â Â Âlen -= ret;
> --
> 1.7.0
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux