Re: [PATCH 09/10] vhost, mm: make sure that oom_reaper doesn't reap memory read by vhost

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Sun, 14 Aug 2016 19:57:20 +0300

On Sun, Aug 14, 2016 at 10:41:52AM +0200, Michal Hocko wrote:
> On Sat 13-08-16 03:15:00, Michael S. Tsirkin wrote:
> > On Fri, Aug 12, 2016 at 03:21:41PM +0200, Oleg Nesterov wrote:
> > > Whats really interesting is that I still fail to understand do we really
> > > need this hack, iiuc you are not sure too, and Michael didn't bother to
> > > explain why a bogus zero from anon memory is worse than other problems
> > > caused by SIGKKILL from oom-kill.c.
> > 
> > vhost thread will die, but vcpu thread is going on.  If it's memory is
> > corrupted because vhost read 0 and uses that as an array index, it can
> > do things like corrupt the disk, so it can't be restarted.
> > 
> > But I really wish we didn't need this special-casing.  Can't PTEs be
> > made invalid on oom instead of pointing them at the zero page?
> 
> Well ptes are just made !present and the subsequent #PF will allocate
> a fresh new page which will be a zero page as the original content is
> gone already.

Can't we set a flag to make fixups desist from faulting
in memory?

> But I am not really sure what you mean by an invalid
> pte. You are in a kernel thread context, aka unkillable context. How
> would you handle SIGBUS or whatever other signal as a result of the
> invalid access?

No need for signal - each copy from user access is already
checked for errors.

> > And then
> > won't memory accesses trigger pagefaults instead of returning 0?
> 
> See above. Zero page is just result of the lost memory content. We
> cannot both reclaim and keep the original content.

Isn't this what decides it's a valid address so
we need to bring in a page (in __do_page_fault)?

        vma = find_vma(mm, address);
        if (unlikely(!vma)) {
                bad_area(regs, error_code, address);
                return;
        }       
        if (likely(vma->vm_start <= address))
                goto good_area;
        if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
                bad_area(regs, error_code, address);
                return;
        }       

So why can't we check a flag here, and call bad_area?
then vhost will get an error from access to userspace
memory and can handle it correctly.

> > That
> > would make regular copy_from_user machinery do the right thing,
> > making vhost stop running as appropriate.
> 
> I must be missing something here but how would you make the kernel
> thread context find out the invalid access. You would have to perform
> signal handling routine after every single memory access and I fail how
> this is any different from a special copy_from_user_mm.

No because IIUC no checks are needed as long as there
is no fault. On fault, fixups are run, at the moment
they bring in a page, I am saying they should
behave as if an invalid address was accessed instead.

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>