On Thu, Aug 11, 2016 at 3:10 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Aug 11, 2016 at 09:46:51PM +0300, Tuomas Tynkkynen wrote: >> Greetings, >> >> I've noticed a corner case with writev() both modifying the file and >> returning -EFAULT at the same time. This happens on filesystems using >> generic_perform_write() (i.e. ext4, vfat) on 4.6.3 kernel and below, >> down to 3.16. Here's the reproducer: Yeah, if you have things that cause EFAULT, all bets are off. Bad things may have happened in the middle, and POSIX allows it. That said, from a quality-of-implementation standpoint, we've generally tried to do the robust thing, and return partial write results etc. So in general, EFAULT does mean "nothing happened", but there are exceptions: > We could, in principle, provide a non-zeroing variants of copy_from_user and > copy_from_user_inatomic and switch ->write()/->write_iter() instances to > use of those; however, I'm not sure if it's really worth bothering with. That would be nice, but it doesn't really work. When we write a whole page, we may have actively thrown the old values away. In other words, we saw that the user was going to overwrite the page, we didn't even read it in at all, knowing that it's all going to be overwritten. We now get a page fault in the middle, so we've overwritten part of the page, but the rest of the page is just garbage (and _not_ the old contents). End result: overwrite the garbage with zeroes, and take advantage of the fact that POSIX says that EFAULT is "undefined". We *can* handle the case of immediate page fault slightly better - the whole prefault thing notices that it gets no data at all, and it could just say "don't even look up the page in the first place". But it doesn't really help the more complicated "page fault in the middle" case, so on the whole I really think that we're better off just saying that page faults in the middle of a write will leave the file in an inconsistent state. If somebody wants to make it behave slightly better, and has a reasonable patch for it, I certainly won't argue against better semantics. But at the same time, the basic rule really is: "If you give bad virtual memory regions to system calls, you get to keep the resulting broken piece and blame yourself". anything the kernel does better is purely about us being polite, not about correctness or caring deeply. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html