On Wed, Sep 30, 2015 at 05:30:17PM -0400, Sasha Levin wrote: > > So I've traced this all the way back to dax_io(). I can trigger this with: > > > > diff --git a/fs/dax.c b/fs/dax.c > > index 93bf2f9..2cdb8a5 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -178,6 +178,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, > > if (need_wmb) > > wmb_pmem(); > > > > + WARN_ON((pos == start) && (pos - start > iov_iter_count(iter))); > > return (pos == start) ? retval : pos - start; > > } > > > > So it seems that iter gets moved twice here: once in dax_io(), and once again > > back at generic_file_read_iter(). > > > > I don't see how it ever worked. Am I missing something? This: struct iov_iter data = *iter; retval = mapping->a_ops->direct_IO(iocb, &data, pos); } if (retval > 0) { *ppos = pos + retval; iov_iter_advance(iter, retval); The iterator advanced in ->direct_IO() is a _copy_, not the original. The contents of *iter as seen by generic_file_read_iter() is not modifiable by ->direct_IO(), simply because its address is nowhere to be found. And checking iov_iter_count(iter) at the end of dax_io() is pointless - from the POV of generic_file_read_iter() it's data.count, and while it used to be equal to iter->count, it's already been modified. By the time we call iov_iter_advance() in generic_file_read_iter() that value will be already discarded, along with rest of struct iov_iter data. Wait a minute - you are triggering _what_??? > > + WARN_ON((pos == start) && (pos - start > iov_iter_count(iter))); With '&&'? iov_iter_count() is size_t, while pos and start are loff_t, so you are seeing equal values in pos and start (as integers) *and* (loff_t)0 > (size_t)something. loff_t is a signed type, size_t - unsigned. 6.3.1.8[1] says that * if rank of size_t is greater or equal to rank of loff_t, the latter gets converted to size_t. And conversion of zero should be zero, i.e. (size_t) 0 > (size_t)something, which is impossible (we compare them as non-negative integers). * if loff_t can represent all values of size_t, size_t value gets converted to loff_t. Result of conversion should have the same (in particular, non-negative) value. Again, comparison can't be true. * otherwise both values are converted to unsigned counterpart of loff_t. Again, zero converts to 0 and in any unsigned type 0 > x is impossible. I don't see any way for that condition to evaluate true. Assuming that it's a misquoted ||... I don't see any way for pos to get greater than start + original iov_iter_count(). However, I *do* see a way for bad things to happen in a different way. Look: // first pass through the loop, pos == start (and so's max) retval = dax_get_addr(bh, &addr, blkbits); // got a positive value if (retval < 0) break; // nope, keep going if (buffer_unwritten(bh) || buffer_new(bh)) { dax_new_buf(addr, retval, first, pos, end); need_wmb = true; } addr += first; size = retval - first; // OK... } max = min(pos + size, end); // OK... } if (iov_iter_rw(iter) == WRITE) { len = copy_from_iter_pmem(addr, max - pos, iter); need_wmb = true; } else if (!hole) len = copy_to_iter((void __force *)addr, max - pos, iter); else len = iov_iter_zero(max - pos, iter); // too bad - we'd hit an unmapped memory area. len is 0... // and retval is fucking positive. if (!len) break; return (pos == start) ? retval : pos - start; // will return a bloody big positive value Could you try to reproduce it with this: dax_io(): don't let non-error value escape via retval instead of EFAULT Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> --- diff --git a/fs/dax.c b/fs/dax.c index a86d3cc..7b653e9 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, else len = iov_iter_zero(max - pos, iter); - if (!len) + if (!len) { + retval = -EFAULT; break; + } pos += len; addr += len; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html