Re: [Bug 117731] New: Doing mprotect for PROT_NONE and then for PROT_READ|PROT_WRITE reduces CPU write B/W on buffer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Great bug report, thanks.

I assume the breakage was caused by

commit 64e455079e1bd7787cc47be30b7f601ce682a5f6
Author:     Peter Feiner <pfeiner@xxxxxxxxxx>
AuthorDate: Mon Oct 13 15:55:46 2014 -0700
Commit:     Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
CommitDate: Tue Oct 14 02:18:28 2014 +0200

    mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
    

Could someone (Peter, Kirill?) please take a look?

On Fri, 06 May 2016 13:15:19 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=117731
> 
>             Bug ID: 117731
>            Summary: Doing mprotect for PROT_NONE and then for
>                     PROT_READ|PROT_WRITE reduces CPU write B/W on buffer
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.18 and beyond
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: ashish0srivastava0@xxxxxxxxx
>         Regression: No
> 
> Created attachment 215401
>   --> https://bugzilla.kernel.org/attachment.cgi?id=215401&action=edit
> Repro code
> 
> This is a regression that is present in kernel 3.18 and beyond and not in
> previous ones.
> Attached is a simple repro case. It measures the time taken to write and then
> read all pages in a buffer, then it does mprotect for PROT_NONE and then
> mprotect for PROT_READ|PROT_WRITE, then it again measures time taken to write
> and then read all pages in a buffer. The 2nd time taken is much larger (20 to
> 30 times) than the first one.
> 
> I have looked at the code in the kernel tree that is causing this and it is
> because writes are causing faults, as pte_mkwrite is not being done during
> mprotect_fixup for PROT_READ|PROT_WRITE.
> 
> This is the code inside mprotect_fixup in a tree v3.16.35 or older:
>     /*
>      * vm_flags and vm_page_prot are protected by the mmap_sem
>      * held in write mode.
>      */
>     vma->vm_flags = newflags;
>     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>                       vm_get_page_prot(newflags));
> 
>     if (vma_wants_writenotify(vma)) {
>         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
>         dirty_accountable = 1;
>     }
> This is the code in the same region inside mprotect_fixup in a recent tree:
>     /*
>      * vm_flags and vm_page_prot are protected by the mmap_sem
>      * held in write mode.
>      */
>     vma->vm_flags = newflags;
>     dirty_accountable = vma_wants_writenotify(vma);
>     vma_set_page_prot(vma);
> 
> The difference is the setting of dirty_accountable. result of
> vma_wants_writenotify does not depend on vma->vm_flags alone but also depends
> on vma->vm_page_prot and following code will make it return 0 because in newer
> code we are setting dirty_accountable before setting vma->vm_page_prot.
>     /* The open routine did something to the protections that pgprot_modify
>      * won't preserve? */
>     if (pgprot_val(vma->vm_page_prot) !=
>         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>         return 0;
> 
> Now, suppose we change code by calling vma_set_page_prot before setting
> dirty_accountable:
>     vma->vm_flags = newflags;
>     vma_set_page_prot(vma);
>     dirty_accountable = vma_wants_writenotify(vma);
> Still, dirty_accountable will be 0. This is because following code in
> vma_set_page_prot modifies vma->vm_page_prot without modifying vma->vm_flags:
>     if (vma_wants_writenotify(vma)) {
>         vm_flags &= ~VM_SHARED;
>         vma->vm_page_prot = vm_pgprot_modify(vma->vm_page_prot,
>                              vm_flags);
>     }
> so this check in vma_wants_writenotify will again return 0: 
>     /* The open routine did something to the protections that pgprot_modify
>      * won't preserve? */
>     if (pgprot_val(vma->vm_page_prot) !=
>         pgprot_val(vm_pgprot_modify(vma->vm_page_prot, vm_flags)))
>         return 0;
> So dirty_accountable is still 0.
> 
> This code in change_pte_range decides whether to call pte_mkwrite or not:
>             /* Avoid taking write faults for known dirty pages */
>             if (dirty_accountable && pte_dirty(ptent) &&
>                     (pte_soft_dirty(ptent) ||
>                      !(vma->vm_flags & VM_SOFTDIRTY))) {
>                 ptent = pte_mkwrite(ptent);
>             }
> If dirty_accountable is 0 even though the pte was dirty already, pte_mkwrite
> will not be done.
> 
> I think the correct solution should be that dirty_accountable be set with the
> value of vma_wants_writenotify queried before vma->vm_page_prot is set with
> VM_SHARED removed from flags. One way to do so could be to have
> vma_set_page_prot return the value of dirty_accountable that it can set right
> after vma_wants_writenotify check. Another way could be to do
>     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
>                       vm_get_page_prot(newflags));
> and then set dirty_accountable based on vma_wants_writenotify and then call
> vma_set_page_prot.
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]