Re: [PATCH] mm: Loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/19/2015 09:22 AM, Christian Borntraeger wrote:
On 11/18/2015 02:31 PM, Vlastimil Babka wrote:
[CC += linux-api@xxxxxxxxxxxxxxx]
Anyway, I agree that it doesn't make sense to fail madvise when the given flag
is already set. On the other hand, I don't think the userspace app should fail
just because of madvise failing? It should in general be an advice that the
kernel is also strictly speaking free to ignore as it shouldn't affect
correctnes, just performance. Yeah, there are exceptions today like
MADV_DONTNEED, but that shouldn't apply to hugepages?
So I think Qemu needs fixing too.

yes, I agree. David, Juan. I think The postcopy code should not fail if the madvise.
Can you fix that?

  Also what happens if the kernel is build
without CONFIG_TRANSPARENT_HUGEPAGE? Then madvise also returns EINVAL,

Does it? To me it looks more like we would trigger a kernel bug.

mm/madvise.c:
         case MADV_HUGEPAGE:
         case MADV_NOHUGEPAGE:
                 error = hugepage_madvise(vma, &new_flags, behavior);  <-----
                 if (error)
                         goto out;
                 break;
         }


include/linux/huge_mm.h:
static inline int hugepage_madvise(struct vm_area_struct *vma,
                                    unsigned long *vm_flags, int advice)
{
         BUG();
         return 0;
}

If we just remove the BUG() statement the code would actually be correct
in simply ignoring an MADVISE it cannot handle. If you agree, I can
spin a patch.

Yeah this looks suspicious at first, but the code is not reachable thanks to madvise_behavior_valid() returning false:

...
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
        case MADV_HUGEPAGE:
        case MADV_NOHUGEPAGE:
#endif
        case MADV_DONTDUMP:
        case MADV_DODUMP:
                return true;

        default:
                return false;

I think the BUG() is pointless (KSM doesn't use it) but not wrong. I wouldn't object to removal.



how does Qemu handle that?

The normal qemu startup ignores the return value of the madvise. Only the
recent post migration changes want to disable huge pages for userfaultd.
And this code checks the return value. And yes, we should change that
in QEMU.

Great, thanks :)

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux