Re: Corruption with O_DIRECT and unaligned user buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 18 Dec 2008 16:29:52 +0100
Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:

> On Wed, Nov 19, 2008 at 05:58:19PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 19, 2008 at 03:25:59PM +1100, Nick Piggin wrote:
> > > The solution either involves synchronising forks and get_user_pages,
> > > or probably better, to do copy on fork rather than COW in the case
> > > that we detect a page is subject to get_user_pages. The trick is in
> > > the details :)
> > 

> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Subject: fork-o_direct-race
> 
> Think a thread writing constantly to the last 512bytes of a page, while another
> thread read and writes to/from the first 512bytes of the page. We can lose
> O_DIRECT reads, the very moment we mark any pte wrprotected because a third
> unrelated thread forks off a child.
> 
> This fixes it by never wprotecting anon ptes if there can be any direct I/O in
> flight to the page, and by instantiating a readonly pte and triggering a COW in
> the child. The only trouble here are O_DIRECT reads (writes to memory, read
> from disk). Checking the page_count under the PT lock guarantees no
> get_user_pages could be running under us because if somebody wants to write to
> the page, it has to break any cow first and that requires taking the PT lock in
> follow_page before increasing the page count.
> 
> The COW triggered inside fork will run while the parent pte is read-write, this
> is not usual but that's ok as it's only a page copy and it doesn't modify the
> page contents.
> 
> In the long term there should be a smp_wmb() in between page_cache_get and
> SetPageSwapCache in __add_to_swap_cache and a smp_rmb in between the
> PageSwapCache and the page_count() to remove the trylock op.
> 
> Fixed version of original patch from Nick Piggin.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

Confirmed this fixes the problem.

Hmm, but, fork() gets slower. 

Result of cost-of-fork() on ia64.
==
  size of memory  before  after
  Anon=1M   	, 0.07ms, 0.08ms
  Anon=10M  	, 0.17ms, 0.22ms
  Anon=100M 	, 1.15ms, 1.64ms
  Anon=1000M	, 11.5ms, 15.821ms
==

fork() cost is 135% when the process has 1G of Anon.

test program is below. (used "/usr/bin/time" for measurement.)
==
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>


int main(int argc, char *argv[])
{
        int size, i, status;
        char *c;

        size = atoi(argv[1]) * 1024 * 1024;
        c = malloc(size);
        memset(c, 0,size);
        for (i = 0; i < 5000; i++) {
                if (!fork()) {
                        exit(0);
                }
                wait(&status);
        }
}
==





--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux