Hi Sage, We are on Dragon Boat Festival holidays. :) I will take a close look and test it next week. -- Henry 2011/6/4 Sage Weil <sage@xxxxxxxxxxxx>: > Hi Henry, > > I made a small change to the O_DIRECT path to zero holes properly in > commit 85defe7 (below). ÂDo you mind reviewing the change, and/or testing, > since you are the main O_DIRECT user? ÂThe test case that was failing is > here: > > Â Â Â Âhttp://tracker.newdream.net/issues/1096#note-19 > > The problem was that the read coming down from the VFS isn't trimmed to > i_size, so the old zero tail check wasn't true, and we would set > *checkeof. ÂThen ceph_aio_read would getattr and loop, since we > didn't actually read eof (due to a hole). > > Actually, I suspect the *checkeof part is still incorrect... does the > zeroing part at least look right to you? > > Thanks! > sage > > > From 85defe76f7e2a0b3d285a3be72fcffce96629b5c Mon Sep 17 00:00:00 2001 > From: Sage Weil <sage@xxxxxxxxxxxx> > Date: Wed, 1 Jun 2011 16:08:44 -0700 > Subject: [PATCH] ceph: fix short sync reads from the OSD > > If we get a short read from the OSD because the object is small, we need to > zero the remainder of the buffer. ÂFor O_DIRECT reads, the attempted range > is not trimmed to i_size by the VFS, so we were actually looping > indefinitely. > > Fix by trimming by i_size, and the unconditionally zeroing the trailing > range. > > Reported-by: Jeff Wu <cpwu@xxxxxxxxxxxxx> > Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx> > --- > Âfs/ceph/file.c | Â 28 +++++++++++++++------------- > Â1 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 8c5ac4e..b654f40 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -283,7 +283,7 @@ int ceph_release(struct inode *inode, struct file *file) > Âstatic int striped_read(struct inode *inode, > Â Â Â Â Â Â Â Â Â Â Â Âu64 off, u64 len, > Â Â Â Â Â Â Â Â Â Â Â Âstruct page **pages, int num_pages, > - Â Â Â Â Â Â Â Â Â Â Â int *checkeof, bool align_to_pages, > + Â Â Â Â Â Â Â Â Â Â Â int *checkeof, bool o_direct, > Â Â Â Â Â Â Â Â Â Â Â Âunsigned long buf_align) > Â{ > Â Â Â Âstruct ceph_fs_client *fsc = ceph_inode_to_client(inode); > @@ -308,7 +308,7 @@ static int striped_read(struct inode *inode, > Â Â Â Âio_align = off & ~PAGE_MASK; > > Âmore: > - Â Â Â if (align_to_pages) > + Â Â Â if (o_direct) > Â Â Â Â Â Â Â Âpage_align = (pos - io_align + buf_align) & ~PAGE_MASK; > Â Â Â Âelse > Â Â Â Â Â Â Â Âpage_align = pos & ~PAGE_MASK; > @@ -346,20 +346,22 @@ more: > Â Â Â Â} > > Â Â Â Âif (was_short) { > - Â Â Â Â Â Â Â /* was original extent fully inside i_size? */ > - Â Â Â Â Â Â Â if (pos + left <= inode->i_size) { > - Â Â Â Â Â Â Â Â Â Â Â dout("zero tail\n"); > - Â Â Â Â Â Â Â Â Â Â Â ceph_zero_page_vector_range(page_off + read, len - read, > + Â Â Â Â Â Â Â /* did we bounce off eof? */ > + Â Â Â Â Â Â Â if (pos + left > inode->i_size) > + Â Â Â Â Â Â Â Â Â Â Â *checkeof = 1; > + > + Â Â Â Â Â Â Â /* zero trailing bytes (inside i_size) */ > + Â Â Â Â Â Â Â if (left > 0 && pos < inode->i_size) { > + Â Â Â Â Â Â Â Â Â Â Â if (pos + left > inode->i_size) > + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â left = inode->i_size - pos; > + > + Â Â Â Â Â Â Â Â Â Â Â dout("zero tail %d\n", left); > + Â Â Â Â Â Â Â Â Â Â Â ceph_zero_page_vector_range(page_off + read, left, > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âpages); > - Â Â Â Â Â Â Â Â Â Â Â read = len; > - Â Â Â Â Â Â Â Â Â Â Â goto out; > + Â Â Â Â Â Â Â Â Â Â Â read += left; > Â Â Â Â Â Â Â Â} > - > - Â Â Â Â Â Â Â /* check i_size */ > - Â Â Â Â Â Â Â *checkeof = 1; > Â Â Â Â} > > -out: > Â Â Â Âif (ret >= 0) > Â Â Â Â Â Â Â Âret = read; > Â Â Â Âdout("striped_read returns %d\n", ret); > @@ -659,7 +661,7 @@ out: > > Â Â Â Â Â Â Â Â/* hit EOF or hole? */ > Â Â Â Â Â Â Â Âif (statret == 0 && *ppos < inode->i_size) { > - Â Â Â Â Â Â Â Â Â Â Â dout("aio_read sync_read hit hole, reading more\n"); > + Â Â Â Â Â Â Â Â Â Â Â dout("aio_read sync_read hit hole, ppos %lld < size %lld, reading more\n", *ppos, inode->i_size); > Â Â Â Â Â Â Â Â Â Â Â Âread += ret; > Â Â Â Â Â Â Â Â Â Â Â Âbase += ret; > Â Â Â Â Â Â Â Â Â Â Â Âlen -= ret; > -- > 1.7.0 > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html