On Tue, Dec 04, 2012 at 11:13:40AM -0800, John Stultz wrote: > On 12/03/2012 11:22 PM, Minchan Kim wrote: > >On Mon, Dec 03, 2012 at 04:57:20PM -0800, John Stultz wrote: > >>On 12/03/2012 04:00 PM, Minchan Kim wrote: > >>>On Wed, Nov 28, 2012 at 08:18:01PM -0800, John Stultz wrote: > >>>>On 11/21/2012 04:36 PM, John Stultz wrote: > >>>>>2) Being able to use this with tmpfs files. I'm currently trying > >>>>>to better understand the rmap code, looking to see if there's a > >>>>>way to have try_to_unmap_file() work similarly to > >>>>>try_to_unmap_anon(), to allow allow users to madvise() on mmapped > >>>>>tmpfs files. This would provide a very similar interface as to > >>>>>what I've been proposing with fadvise/fallocate, but just using > >>>>>process virtual addresses instead of (fd, offset) pairs. The > >>>>>benefit with (fd,offset) pairs for Android is that its easier to > >>>>>manage shared volatile ranges between two processes that are > >>>>>sharing data via an mmapped tmpfs file (although this actual use > >>>>>case may be fairly rare). I believe we should still be able to > >>>>>rework the ashmem internals to use madvise (which would provide > >>>>>legacy support for existing android apps), so then its just a > >>>>>question of if we could then eventually convince Android apps to > >>>>>use the madvise interface directly, rather then the ashmem unpin > >>>>>ioctl. > >>>>Hey Minchan, > >>>> I've been playing around with your patch trying to better > >>>>understand your approach and to extend it to support tmpfs files. In > >>>>doing so I've found a few bugs, and have some rough fixes I wanted > >>>>to share. There's still a few edge cases I need to deal with (the > >>>>vma-purged flag isn't being properly handled through vma merge/split > >>>>operations), but its starting to come along. > >>>Hmm, my patch doesn't allow to merge volatile with another one by > >>>inserting VM_VOLATILE into VM_SPECIAL so I guess merge isn't problem. > >>>In case of split, __split_vma copy old vma to new vma like this > >>> > >>> *new = *vma; > >>> > >>>So the problem shouldn't happen, I guess. > >>>Did you see the real problem about that? > >>Yes, depending on the pattern that MADV_VOLATILE and MADV_NOVOLATILE > >>is applied, we can get a result where data is purged, but we aren't > >>notified of it. Also, since madvise returns early if it encounters > >>an error, in the case where you have checkerboard volatile regions > >>(say every other page is volatile), which you mark non-volatile with > >>one large MADV_NOVOLATILE call, the first volatile vma will be > >>marked non-volatile, but since it returns purged, the madvise loop > >>will stop and the following volatile regions will be left volatile. > >> > >>The patches in the git tree below which handle the perged state > >>better seem to work for my tests, as far as resolving any > >>overlapping calls. Of course there may yet still be problems I've > >>not found. > >> > >>>>Anyway, take a look at the tree here and let me know what you think. > >>>>http://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/minchan-anonvol > >>Eager to hear what you think! > >Below two patches look good to me. > > > >[rmap: Simplify volatility checking by moving it out of try_to_unmap_one] > >[rmap: ClearPageDirty() when returning SWAP_DISCARD] > > > >[madvise: Fix NOVOLATILE bug] > >I can't understand description of the patch. > >Could you elaborate it with example? > The case I ran into here is if you have a range where you mark every > other page as volatile. Then mark all the pages in that range as > non-volatile in one madvise call. > > sys_madvise() will then find the first vma in the range, and call > madvise_vma(), which marks the first vma non-volatile and return the > purged state. If the page has been purged, sys_madvise code will > note that as an error, and break out of the vma iteration loop, > leaving the following vmas in the range volatile. > > >[madvise: Fixup vma->purged handling] > >I included VM_VOLATILE into VM_SPECIAL intentionally. > >If comment of VM_SPECIAL is right, merge with volatile vmas shouldn't happen. > >So I guess you see other problem. When I see my source code today, locking > >scheme/purge handling is totally broken. I will look at it. Maybe you are seeing > >bug related that. Part of patch is needed. It could be separate patch. > >I will merge it. > I don't think the problem is when vmas being marked VM_VOLATILE are > being merged, its that when we mark the vma as *non-volatile*, and > remove the VM_VOLATILE flag we merge the non-volatile vmas with > neighboring vmas. So preserving the purged flag during that merge is > important. Again, the example I used to trigger this was an > alternating pattern of volatile and non volatile vmas, then marking > the entire range non-volatile (though sometimes in two overlapping > passes). Understood. Thanks. Below patch solves your problems? It's simple than yours. Anyway, both yours and mine are not right fix. As I mentioned, locking scheme is broken. We need anon_vma_lock to handle purged and we should consider fork case, too. diff --git a/mm/madvise.c b/mm/madvise.c index 965a53d..5fa3254 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -41,7 +41,8 @@ static int madvise_need_mmap_write(int behavior) */ static long madvise_behavior(struct vm_area_struct * vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) + unsigned long start, unsigned long end, + int behavior, bool *purged) { struct mm_struct * mm = vma->vm_mm; int error = 0; @@ -151,7 +152,7 @@ success: volatile_lock(vma); vma->vm_flags = new_flags; if (behavior == MADV_NOVOLATILE) { - error = vma->purged; + *purged |= vma->purged; vma->purged = false; } if (behavior == MADV_NOVOLATILE || behavior == MADV_VOLATILE) @@ -309,7 +310,7 @@ static int madvise_hwpoison(int bhv, unsigned long start, unsigned long end) static long madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end, int behavior) + unsigned long start, unsigned long end, int behavior, bool *purged) { switch (behavior) { case MADV_REMOVE: @@ -319,7 +320,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, case MADV_DONTNEED: return madvise_dontneed(vma, prev, start, end); default: - return madvise_behavior(vma, prev, start, end, behavior); + return madvise_behavior(vma, prev, start, end, behavior, purged); } } @@ -405,6 +406,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) int error = -EINVAL; int write; size_t len; + bool purged = false; #ifdef CONFIG_MEMORY_FAILURE if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE) @@ -468,7 +470,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) tmp = end; /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ - error = madvise_vma(vma, &prev, start, tmp, behavior); + error = madvise_vma(vma, &prev, start, tmp, behavior, &purged); if (error) goto out; start = tmp; @@ -488,5 +490,7 @@ out: else up_read(¤t->mm->mmap_sem); + if (!error & purged) + error = 1; return error; } > > thanks > -john > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>