Re: Weird writev() behaviour on EFAULT - also successfully modifying the file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 11, 2016 at 03:39:22PM -0700, Linus Torvalds wrote:

> But at the same time, the basic rule really is:
> 
>  "If you give bad virtual memory regions to system calls, you get to
> keep the resulting broken piece and blame yourself".
> 
> anything the kernel does better is purely about us being polite, not
> about correctness or caring deeply.

Yes, but... it doesn't need to be a bad region at all.  Look: we have a 20Kb
array of char starting at 0x....3ff.  We feed it to write().  Everything
is mapped, etc. - no EFAULT in sight.  However, it is all swapped out at
the moment.  And somebody else has that file mmapped; again, no pathological
cases, not even in the same address space as writer, etc.  File contains no
zero bytes.  Neither does the buffer we are writing.  File position is 0 and
file is considerably longer than 20Kb.

We do a fault-in; fine, the first page (with one byte of useful data) is
swapped in.  We call ->write_begin(), then __copy_from_user_inatomic()
(with pagefaults disabled) the first 4Kb into the page with index 0 in
the file's page cache.  Copy fails after 1 byte, since the next page
is currently still swapped out.  We advance by 1 byte and fault that page
in; fine, now we'll copy 4095 bytes successfully, advance by 4095 and
proceed to writing into the page with index 1 in file's page cache, etc.
In the end everything works fine - no short writes, no EFAULT, all the
data copied into file.

However, _during_ the write the other process had seen something very odd -
it had mmapped a zero-free file, it knows that nobody had been writing any
zero-containing data into it, but it had seen zeroes come and go in the
mmapped area.  That "copied 1 byte" is actually "copied 1 byte, zeroed the
next 4095 bytes".  Sure, it's followed by "copy the next 4095 bytes over
those zeroes", but only after the next chunk of buffer got swapped in.  And
if the writer got killed, the things are even nastier - these zeroes are
*not* overwritten by subsequent data.

Again, it's about as tame case as they come - no NULLs, no EFAULT, no
buffers mmapped from the file we are writing to; just a normal write()
replacing the data in the very beginning of file from a buffer that isn't
page-aligned...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux