On Sun, Aug 14, 2016 at 10:41:52AM +0200, Michal Hocko wrote: > On Sat 13-08-16 03:15:00, Michael S. Tsirkin wrote: > > On Fri, Aug 12, 2016 at 03:21:41PM +0200, Oleg Nesterov wrote: > > > Whats really interesting is that I still fail to understand do we really > > > need this hack, iiuc you are not sure too, and Michael didn't bother to > > > explain why a bogus zero from anon memory is worse than other problems > > > caused by SIGKKILL from oom-kill.c. > > > > vhost thread will die, but vcpu thread is going on. If it's memory is > > corrupted because vhost read 0 and uses that as an array index, it can > > do things like corrupt the disk, so it can't be restarted. > > > > But I really wish we didn't need this special-casing. Can't PTEs be > > made invalid on oom instead of pointing them at the zero page? > > Well ptes are just made !present and the subsequent #PF will allocate > a fresh new page which will be a zero page as the original content is > gone already. Can't we set a flag to make fixups desist from faulting in memory? > But I am not really sure what you mean by an invalid > pte. You are in a kernel thread context, aka unkillable context. How > would you handle SIGBUS or whatever other signal as a result of the > invalid access? No need for signal - each copy from user access is already checked for errors. > > And then > > won't memory accesses trigger pagefaults instead of returning 0? > > See above. Zero page is just result of the lost memory content. We > cannot both reclaim and keep the original content. Isn't this what decides it's a valid address so we need to bring in a page (in __do_page_fault)? vma = find_vma(mm, address); if (unlikely(!vma)) { bad_area(regs, error_code, address); return; } if (likely(vma->vm_start <= address)) goto good_area; if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { bad_area(regs, error_code, address); return; } So why can't we check a flag here, and call bad_area? then vhost will get an error from access to userspace memory and can handle it correctly. > > That > > would make regular copy_from_user machinery do the right thing, > > making vhost stop running as appropriate. > > I must be missing something here but how would you make the kernel > thread context find out the invalid access. You would have to perform > signal handling routine after every single memory access and I fail how > this is any different from a special copy_from_user_mm. No because IIUC no checks are needed as long as there is no fault. On fault, fixups are run, at the moment they bring in a page, I am saying they should behave as if an invalid address was accessed instead. > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>