On Sun, Aug 14, 2016 at 07:57:20PM +0300, Michael S. Tsirkin wrote: > On Sun, Aug 14, 2016 at 10:41:52AM +0200, Michal Hocko wrote: > > On Sat 13-08-16 03:15:00, Michael S. Tsirkin wrote: > > > On Fri, Aug 12, 2016 at 03:21:41PM +0200, Oleg Nesterov wrote: > > > > Whats really interesting is that I still fail to understand do we really > > > > need this hack, iiuc you are not sure too, and Michael didn't bother to > > > > explain why a bogus zero from anon memory is worse than other problems > > > > caused by SIGKKILL from oom-kill.c. > > > > > > vhost thread will die, but vcpu thread is going on. If it's memory is > > > corrupted because vhost read 0 and uses that as an array index, it can > > > do things like corrupt the disk, so it can't be restarted. > > > > > > But I really wish we didn't need this special-casing. Can't PTEs be > > > made invalid on oom instead of pointing them at the zero page? > > > > Well ptes are just made !present and the subsequent #PF will allocate > > a fresh new page which will be a zero page as the original content is > > gone already. > > Can't we set a flag to make fixups desist from faulting > in memory? > > > > But I am not really sure what you mean by an invalid > > pte. You are in a kernel thread context, aka unkillable context. How > > would you handle SIGBUS or whatever other signal as a result of the > > invalid access? > > No need for signal - each copy from user access is already > checked for errors. > > > > And then > > > won't memory accesses trigger pagefaults instead of returning 0? > > > > See above. Zero page is just result of the lost memory content. We > > cannot both reclaim and keep the original content. > > Isn't this what decides it's a valid address so > we need to bring in a page (in __do_page_fault)? > > > vma = find_vma(mm, address); > if (unlikely(!vma)) { > bad_area(regs, error_code, address); > return; > } > if (likely(vma->vm_start <= address)) > goto good_area; > if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { > bad_area(regs, error_code, address); > return; > } > > > So why can't we check a flag here, and call bad_area? > then vhost will get an error from access to userspace > memory and can handle it correctly. > > > > > That > > > would make regular copy_from_user machinery do the right thing, > > > making vhost stop running as appropriate. > > > > I must be missing something here but how would you make the kernel > > thread context find out the invalid access. You would have to perform > > signal handling routine after every single memory access and I fail how > > this is any different from a special copy_from_user_mm. > > No because IIUC no checks are needed as long as there > is no fault. On fault, fixups are run, at the moment > they bring in a page, I am saying they should > behave as if an invalid address was accessed instead. > > > > -- > > Michal Hocko > > SUSE Labs So fundamentally, won't the following make copy to/from user return EFAULT? If yes, vhost is already prepared to handle that. diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index dc80230..e5dbee5 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1309,6 +1309,11 @@ retry: might_sleep(); } + if (unlikely(test_bit(MMF_UNSTABLE, &mm->flags))) { + bad_area(regs, error_code, address); + return; + } + vma = find_vma(mm, address); if (unlikely(!vma)) { bad_area(regs, error_code, address); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>