On 08/10/2011 08:10 AM, David Gibson wrote:
On Mon, Aug 08, 2011 at 11:24:09AM +0300, Avi Kivity wrote:
> On 08/08/2011 09:03 AM, David Gibson wrote:
> >Second, if userspace qemu passing hugepages to kvm can cause (host)
> >kernel memory corruption, that is clearly a host kernel bug. So am I
> >correct in thinking this is basically just a safety feature if qemu is
> >run on a buggy kernel.
>
> Seems so, yes. 2.6.2[456] are exploitable. We only found out after
> these were all released.
>
> >Presumably this bug was corrected at some
> >point? Is the presence of the SYNC_MMU feature just being used as a
> >proxy for "is this kernel recent enough to have the corruption bug
> >fixed"?
>
> SYNC_MMU actually fixes the bug.
Ah, so SYNC_MMU fixed the bug on x86, and all the other archs without
SYNC_MMU were left with a serious memory corruption bug, under a
userspace bandaid. Thanks for that.
Unfortunately it's all too easy to ignore non-x86.
It may be considered that not implementing SYNC_MMU is a bug in itself,
as it allows userspace to pin arbitrary amounts of user memory. At
least on x86 we had shrinkers that kill off shadow page tables under
memory pressure, unpinning memory, but I don't see it on ppc.
As I understand the bug that causes the problem, it's because removing
all the hugepage VMAs from userspace will cause the inode (and
therefore address_space) for the hugepage file to be freed, but not
the pages (because another ref is held by kvm). Then when kvm
releases the pages, the address_space will be touched after free from
free_huge_page().
This would seem to be a genuine bug in the hugepage code, which has
just been hidden by SYNC_MMU. It should be quite easy to fix - the
mapping is only stored in the struct page to get to the hugetlbfs
superblock, so we could just store a direct superblock pointer
instead, and bump it's refcount when we put that in the page private
pointer.
But then I'm not sure how qemu would detect that it's on a kernel
where the bug is fixed and allow -mem-path to be used again. Any
ideas?
If it's just a kernel bug, the fix belongs in the kernel, not in qemu.
We used to have KVM_CAPs to declare this sort of thing
(KVM_CAP_HUGETLBFS_WORKS_EVEN_WITHOUT_SYNC_MMU) but I don't think it was
a good idea.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html