On Thu, Aug 11, 2016 at 09:46:51PM +0300, Tuomas Tynkkynen wrote: > Greetings, > > I've noticed a corner case with writev() both modifying the file and > returning -EFAULT at the same time. This happens on filesystems using > generic_perform_write() (i.e. ext4, vfat) on 4.6.3 kernel and below, > down to 3.16. Here's the reproducer: [fill 8Kb with 'A', then reopen the file and do writev() on two-element iovec array consisting of {NULL,0},{NULL,4096}] > Running that prints "writev(): Bad address" but also some NUL bytes have > appeared at the beginning file, in addition to the 'A's by the dd. > > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00001000 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| > * > 00002000 > > Is that intented behaviour? There are two issues mixed here: one is that iovec array (NULL/0, NULL/4K) we get int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) { if (!(i->type & (ITER_BVEC|ITER_KVEC))) { char __user *buf = i->iov->iov_base + i->iov_offset; bytes = min(bytes, i->iov->iov_len - i->iov_offset); return fault_in_pages_readable(buf, bytes); } return 0; } succeed. It shouldn't. However, even if it correctly failed, we still have another one - __copy_from_user_inatomic() zeroes what it hadn't copied over, so if we e.g. gave write(2) a buffer with unmapped page in the middle, we would get a short write *and* a bunch of bytes past the amount reported written replaced with zeroes. And _that_ goes very far back. 2.1.late definitely had already been that way. Note, BTW, that fixing iov_iter_fault_in_readable() won't exclude a possibility of write() returning -EFAULT and overwriting some bytes with zeroes; all you need is munmap() racing with write() and hitting after the successful fault-in. We could, in principle, provide a non-zeroing variants of copy_from_user and copy_from_user_inatomic and switch ->write()/->write_iter() instances to use of those; however, I'm not sure if it's really worth bothering with. I do see one potential argument in favour of doing that; suppose we have a buffer filled with 'A'. It starts one byte before a page boundary and all its pages are currently in swap. There's a file also filled with 'A' and mmapped by another process. We say write() from our buffer to that file. Having that other process observe transient zeroes appearing in its mmapped area is Not Nice(tm); having them stick around is even worse, and that can happen in case if the first process gets killed while it's faulting the next page of buffer in. Linus? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html