Re: Corruption with O_DIRECT and unaligned user buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
>> On Thu, 18 Dec 2008 16:29:52 +0100
>> Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
>>
>>> On Wed, Nov 19, 2008 at 05:58:19PM +0100, Andrea Arcangeli wrote:
>>>> On Wed, Nov 19, 2008 at 03:25:59PM +1100, Nick Piggin wrote:
>>>>> The solution either involves synchronising forks and get_user_pages,
>>>>> or probably better, to do copy on fork rather than COW in the case
>>>>> that we detect a page is subject to get_user_pages. The trick is in
>>>>> the details :)
>>> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>>> Subject: fork-o_direct-race
>>>
>>> Think a thread writing constantly to the last 512bytes of a page, while another
>>> thread read and writes to/from the first 512bytes of the page. We can lose
>>> O_DIRECT reads, the very moment we mark any pte wrprotected because a third
>>> unrelated thread forks off a child.
>>>
>>> This fixes it by never wprotecting anon ptes if there can be any direct I/O in
>>> flight to the page, and by instantiating a readonly pte and triggering a COW in
>>> the child. The only trouble here are O_DIRECT reads (writes to memory, read
>>> from disk). Checking the page_count under the PT lock guarantees no
>>> get_user_pages could be running under us because if somebody wants to write to
>>> the page, it has to break any cow first and that requires taking the PT lock in
>>> follow_page before increasing the page count.
>>>
>>> The COW triggered inside fork will run while the parent pte is read-write, this
>>> is not usual but that's ok as it's only a page copy and it doesn't modify the
>>> page contents.
>>>
>>> In the long term there should be a smp_wmb() in between page_cache_get and
>>> SetPageSwapCache in __add_to_swap_cache and a smp_rmb in between the
>>> PageSwapCache and the page_count() to remove the trylock op.
>>>
>>> Fixed version of original patch from Nick Piggin.
>>>
>>> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>> Confirmed this fixes the problem.
>>
> 
> We tested with RHEL 5.2 + patch on i386 using the test program provided by
> Tim LaBerge, though the program can pass but sometimes hanged. strace log is
> attached, and we'll test it again with LOCKDEP enabled to see if we can get
> some other information.
> 

# ./dma_thread -a 512

Using 2 workers.

Using alignment 512.

Read buffer: 0xb7e4e000.

Reading file 1.

Reading file 2.

...

Reading file 26.

Reading file 27.


(hang here, Ctrl+C can break the process)


And we modified the program to use 'dma_thread -a 512 -w 1', we can still see
hung in a very low frequency.

==============

Here is a snapshop of call trace:

dma_thread    S 00000035  2872 20296   8797         23256       (NOTLB)
       f7018e78 00000046 1f593e7d 00000035 f7018e84 00000002 00000000 00000006 
       f4c35530 f71ac030 1f5a03da 00000035 0000c55d 00000001 f4c3563c c1a80044 
       f7018f04 f7018f1c b7e4cbd8 00000046 00000000 00000002 00000001 7fffffff 
Call Trace:
 [<c061bd10>] schedule_timeout+0x13/0x8c
 [<c043c435>] do_futex+0x1e2/0xb38
 [<c061d316>] _spin_unlock+0x14/0x1c
 [<c0465938>] do_wp_page+0x3fb/0x405
 [<c0466da0>] __handle_mm_fault+0x858/0x8b8
 [<c041e5f3>] default_wake_function+0x0/0xc
 [<c044e32b>] audit_syscall_entry+0x14b/0x17d
 [<c043ce9c>] sys_futex+0x111/0x127
 [<c0408076>] do_syscall_trace+0xab/0xb1
 [<c0404f53>] syscall_call+0x7/0xb
 =======================
dma_thread    S 00000035  3304 23256   8797 23258         20296 (NOTLB)
       f4e24f50 00000046 1ec0e26d 00000035 c073ea10 416db065 00000046 00000003 
       f70ac030 c1b7eab0 1ec7d7c9 00000035 0006f55c 00000001 f70ac13c c1a80044 
       00005ada f4cc0030 00000000 00000246 ffffffff 00000000 00000000 f53acab0 
Call Trace:
 [<c0426b84>] do_wait+0x8b5/0x9a3
 [<c044e32b>] audit_syscall_entry+0x14b/0x17d
 [<c041e5f3>] default_wake_function+0x0/0xc
 [<c0426c99>] sys_wait4+0x27/0x2a
 [<c0426caf>] sys_waitpid+0x13/0x17
 [<c0404f53>] syscall_call+0x7/0xb
 =======================
dma_thread    R running  3412 23258  23256                     (NOTLB)
...
...
Showing all locks held in the system:
4 locks held by kseriod/82:
 #0:  (serio_mutex){--..}, at: [<c059c7f6>] serio_thread+0x13/0x28d
 #1:  (&serio->drv_mutex){--..}, at: [<c059be7e>] serio_connect_driver+0x16/0x2c
 #2:  (psmouse_mutex){--..}, at: [<c05a41ce>] psmouse_connect+0x18/0x211
 #3:  (&ps2_mutex_key){--..}, at: [<c059e4bd>] ps2_command+0x80/0x2dc

=============================================

> BTW, the patch works fine on IA64.
> 
>> Hmm, but, fork() gets slower. 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux