Re: Weird writev() behaviour on EFAULT - also successfully modifying the file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 11, 2016 at 3:10 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Aug 11, 2016 at 09:46:51PM +0300, Tuomas Tynkkynen wrote:
>> Greetings,
>>
>> I've noticed a corner case with writev() both modifying the file and
>> returning -EFAULT at the same time. This happens on filesystems using
>> generic_perform_write() (i.e. ext4, vfat) on 4.6.3 kernel and below,
>> down to 3.16. Here's the reproducer:

Yeah, if you have things that cause EFAULT, all bets are off.

Bad things may have happened in the middle, and POSIX allows it.

That said, from a quality-of-implementation standpoint, we've
generally tried to do the robust thing, and return partial write
results etc. So in general, EFAULT does mean "nothing happened", but
there are exceptions:

> We could, in principle, provide a non-zeroing variants of copy_from_user and
> copy_from_user_inatomic and switch ->write()/->write_iter() instances to
> use of those; however, I'm not sure if it's really worth bothering with.

That would be nice, but it doesn't really work.

When we write a whole page, we may have actively thrown the old values
away. In other words, we saw that the user was going to overwrite the
page, we didn't even read it in at all, knowing that it's all going to
be overwritten.

We now get a page fault in the middle, so we've overwritten part of
the page, but the rest of the page is just garbage (and _not_ the old
contents). End result: overwrite the garbage with zeroes, and take
advantage of the fact that POSIX says that EFAULT is "undefined".

We *can* handle the case of immediate page fault slightly better - the
whole prefault thing notices that it gets no data at all, and it could
just say "don't even look up the page in the first place". But it
doesn't really help the more complicated "page fault in the middle"
case, so on the whole I really think that we're better off just saying
that page faults in the middle of a write will leave the file in an
inconsistent state.

If somebody wants to make it behave slightly better, and has a
reasonable patch for it, I certainly won't argue against better
semantics.

But at the same time, the basic rule really is:

 "If you give bad virtual memory regions to system calls, you get to
keep the resulting broken piece and blame yourself".

anything the kernel does better is purely about us being polite, not
about correctness or caring deeply.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux