On Thu, 23 Jan 2014, Matthew Wilcox wrote: > On Wed, Jan 22, 2014 at 03:51:56PM -0700, Ross Zwisler wrote: > > > + if (hole) { > > > addr = NULL; > > > - hole = true; > > > size = bh->b_size; > > > + } else { > > > + unsigned first; > > > + retval = xip_get_addr(inode, bh, &addr); > > > + if (retval < 0) > > > + break; > > > + size = retval; > > > + first = offset - (block << inode->i_blkbits); > > > + if (buffer_unwritten(bh)) > > > + memset(addr, 0, first); > > > + addr += first; > > > > + size -= first; > > > > This is needed so that we don't overrun the XIP buffer we are given in the > > event that our user buffer >= our XIP buffer and the start of our I/O isn't > > block aligned. > > You're right! Thank you! However, we also need it for the hole == > true case, don't we? So maybe something like this, incrementally on top of > patch 22/22: > > P.S. Can someone come up with a better name for this variable than 'first'? > I'd usually use 'offset', but that's already taken. 'annoying_bit' seems a > bit judgemental. 'misaligned', maybe? 'skip' or 'seek' like dd uses? > > diff --git a/fs/xip.c b/fs/xip.c > index 92157ff..1ae00db 100644 > --- a/fs/xip.c > +++ b/fs/xip.c > @@ -103,6 +103,7 @@ static ssize_t xip_io(int rw, struct inode *inode, const > struct iovec *iov, > > if (max == offset) { > sector_t block = offset >> inode->i_blkbits; > + unsigned first = offset - (block << inode->i_blkbits); > long size; > memset(bh, 0, sizeof(*bh)); > bh->b_size = ALIGN(end - offset, PAGE_SIZE); > @@ -121,14 +122,12 @@ static ssize_t xip_io(int rw, struct inode *inode, > const struct iovec *iov, > > if (hole) { > addr = NULL; > - size = bh->b_size; > + size = bh->b_size - first; It looks like we have an additional bit of complexity with the hole case. The issue is that for holes, bh->b_size is just the full size of the write as set earlier in the function: bh->b_size = ALIGN(end - offset, PAGE_SIZE); >From this code it seems like you hoped the call into get_block() would adjust bh->b_size to the size of the hole, allowing you to zero just the hole space in the user buffer. It doesn't look like it does, though, at least for ext4. In looking at the direct I/O case (do_direct_IO()), they deal with holes on a per FS block basis, and don't ever look at bh->b_size once they've figured out the buffer is unmapped. The result of this is that when you get a read that starts at a hole but moves into real data, the read will just see a hole and return data of all zeros. To just assume the current FS block is a hole, we can do something like this: diff --git a/fs/xip.c b/fs/xip.c index 35e401e..e902593 100644 --- a/fs/xip.c +++ b/fs/xip.c @@ -122,7 +122,7 @@ static ssize_t xip_io(int rw, struct inode *inode, const struct if (hole) { addr = NULL; - size = bh->b_size - first; + size = (1 << inode->i_blkbits) - first; } else { retval = xip_get_addr(inode, bh, &addr); if (retval < 0) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html