On Wed, Apr 2, 2014 at 11:07 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > On Wed, Apr 02, 2014 at 10:48:03AM -0700, John Stultz wrote: >> I suspect handling the SIGBUS and patching up the purged page you >> trapped on is likely much to complicated for most use cases. But I do >> think SIGBUS is preferable to zero-fill on purged page access, just >> because its likely to be easier to debug applications. > > Fully agreed, but it seems a bit overkill to add a separate syscall, a > range-tree on top of shmem address_spaces, and an essentially new > programming model based on SIGBUS userspace fault handling (incl. all > the complexities and confusion this inevitably will bring when people > DO end up passing these pointers into kernel space) just to be a bit > nicer about use-after-free bugs in applications. Its more about making an interface that has graspable semantics to userspace, instead of having the semantics being a side-effect of the implementation. Tying volatility to the page-clean state and page-was-purged to page-present seems problematic to me, because there are too many ways to change the page-clean or page-present outside of the interface being proposed. I feel this causes a cascade of corner cases that have to be explained to users of the interface. Also I disagree we're adding a new programming model, as SIGBUSes can already be caught, just that there's not usually much one can do, where with volatile pages its more likely something could be done. And again, its really just a side-effect of having semantics (SIGBUS on purged page access) that are more helpful from a applications perspective. As for the separate syscall: Again, this is mainly needed to handle allocation failures that happen mid-way through modifying the range. There may still be a way to do the allocation first and only after it succeeds do the modification. The vma merge/splitting logic doesn't make this easy but if we can be sure that on a failed split of 1 vma -> 3 vmas (which may fail half way) we can re-merge w/o allocation and error out (without having to do any other allocations), this might be avoidable. I'm still wanting to look at this. If so, it would be easier to re-add this support under madvise, if folks really really don't like the new syscall. For the most part, having the separate syscall allows us to discuss other details of the semantics, which to me are more important then the syscall naming. thanks -john -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>