On Sat, 7 Apr 2012, Konstantin Khlebnikov wrote: > Hugh Dickins wrote: > > > > I've long detested that behaviour of GUP write,force, and my strong > > preference would be not to layer more strangeness upon strangeness, > > but limit the damage by making GUP write,force fail in that case, > > instead of inserting a PageAnon page into a VM_SHARED mapping. > > > > I think it's unlikely that it will cause a regression in real life > > (it already fails if you did not open the mmap'ed file for writing), > > but it would be a user-visible change in behaviour, and I've research > > to do before arriving at a conclusion. > > Agree, but this stuff is very weak. Even if sysctl vm.overcommit_memory=2, > probably we should fixup accounting in /proc/pid/mem only for this case, > because vm.overcommit_memory=2 supposed to protect against overcommit, but it > does not. You are right (and it's not the first time I've had to say so!). At first I was puzzled by your answer, then went back to your initial mail, which clearly says "Currently kernel does not account read-only private mappings into memory commitment. But these mappings can be force-COW-ed in get_user_pages()" and realized (a) that I'd forgotten about that weakness in the overcommit stuff, and (b) that I'd therefore let my obsession with the anon-in-shared issue blind me to what you were actually saying. "private", yes, sorry about that. Let's set aside the shared issue for the moment, then, though I'll need to answer Oleg after (and writing to /proc/pid/mem makes that more serious than before too). Yes, GUP force-write into private-readonly can violate no-overcommit. Force-write was originally intended just for setting breakpoints with ptrace(2), and we've taken the view that fixing the no-overcommit issue is simply more trouble than it's worth. But I agree with you that once writing to /proc/pid/mem was enabled a year ago, it became a more serious defect: I don't think /proc/pid/mem makes anything new possible, but it does make it easier - I imagine dd(1) could achieve the same as your little program. I was hoping to find that writing to /proc/pid/mem was enabled solely because "why not?", and the 198214a7ee commit message does suggest so; but I see there was a 0/12 mail which never reached git, which makes clear that it was intended for debuggers to use instead of ptrace(2). So I think my first reaction, to disallow write-force on readonly via /proc/pid/mem, would not be helpful. I think all solutions to this are unsatifactory, and most ugly. I'd better pull up your original mail to review in detail, but I'd say yours is no exception: ugly and unsatisfactory, but quite possibly less ugly and less unsatisfactory than most. I was quite happy to be doing nothing at all about this, but now that you've raised the matter, I can understand that you won't want it to rest there. Of course, the issue can be dealt with by an additional data structure; but extending vm_area_struct or anon_vma (if only by one usually-NULL pointer), and adding the code to handle it, would itself be unsatisfactory - and I guess you felt the same. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>