Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops

黄杰 <huangjie.albert@xxxxxxxxxxxxx> · Mon, 17 Oct 2022 19:46:55 +0800

David Hildenbrand <david@xxxxxxxxxx> 于2022年10月17日周一 19:33写道：
>
> On 17.10.22 11:48, 黄杰 wrote:
> > David Hildenbrand <david@xxxxxxxxxx> 于2022年10月17日周一 16:44写道：
> >>
> >> On 12.10.22 10:15, Albert Huang wrote:
> >>> From: "huangjie.albert" <huangjie.albert@xxxxxxxxxxxxx>
> >>>
> >>> implement these two functions so that we can set the mempolicy to
> >>> the inode of the hugetlb file. This ensures that the mempolicy of
> >>> all processes sharing this huge page file is consistent.
> >>>
> >>> In some scenarios where huge pages are shared:
> >>> if we need to limit the memory usage of vm within node0, so I set qemu's
> >>> mempilciy bind to node0, but if there is a process (such as virtiofsd)
> >>> shared memory with the vm, in this case. If the page fault is triggered
> >>> by virtiofsd, the allocated memory may go to node1 which  depends on
> >>> virtiofsd.
> >>>
> >>
> >> Any VM that uses hugetlb should be preallocating memory. For example,
> >> this is the expected default under QEMU when using huge pages.
> >>
> >> Once preallocation does the right thing regarding NUMA policy, there is
> >> no need to worry about it in other sub-processes.
> >>
> >
> > Hi, David
> > thanks for your reminder
> >
> > Yes, you are absolutely right, However, the pre-allocation mechanism
> > does solve this problem.
> > However, some scenarios do not like to use the pre-allocation mechanism, such as
> > scenarios that are sensitive to virtual machine startup time, or
> > scenarios that require
> > high memory utilization. The on-demand allocation mechanism may be better,
> > so the key point is to find a way support for shared policy。
>
> Using hugetlb -- with a fixed pool size -- without preallocation is like
> playing with fire. Hugetlb reservation makes one believe that on-demand
> allocation is going to work, but there are various scenarios where that
> can go seriously wrong, and you can run out of huge pages.
>
> If you're using hugetlb as memory backend for a VM without
> preallocation, you really have to be very careful. I can only advise
> against doing that.
>
>
> Also: why does another process read/write *first* to a guest physical
> memory location before the OS running inside the VM even initialized
> that memory? That sounds very wrong. What am I missing?
>

for example : virtio ring buffer.
For the avial descriptor, the guest kernel only gives an address to
the backend,
and does not actually access the memory.

Thanks.

> --
> Thanks,
>
> David / dhildenb
>