On 17.10.22 09:32, Baolin Wang wrote:
When creating a virtual machine, we will use memfd_create() to get a file descriptor which can be used to create share memory mappings using the mmap function, meanwhile the mmap() will set the MAP_POPULATE flag to allocate physical pages for the virtual machine. When allocating physical pages for the guest, the host can fallback to allocate some CMA pages for the guest when over half of the zone's free memory is in the CMA area. In guest os, when the application wants to do some data transaction with DMA, our QEMU will call VFIO_IOMMU_MAP_DMA ioctl to do longterm-pin and create IOMMU mappings for the DMA pages. However, when calling VFIO_IOMMU_MAP_DMA ioctl to pin the physical pages, we found it will be failed to longterm-pin sometimes. After some invetigation, we found the pages used to do DMA mapping can contain some CMA pages, and these CMA pages will cause a possible failure of the longterm-pin, due to failed to migrate the CMA pages. The reason of migration failure may be temporary reference count or memory allocation failure. So that will cause the VFIO_IOMMU_MAP_DMA ioctl returns error, which makes the application failed to start. To fix this issue, this patch introduces a new madvise behavior, named as MADV_NOMOVABLE, to avoid allocating CMA pages and movable pages if the users want to do longterm-pin, which can remove the possible failure of movable or CMA pages migration.
Sorry to say, but that sounds like a hack to work around a kernel implementation detail (how often we retry to migrate pages).
If there are CMA/ZONE_MOVABLE issue, please fix them instead, and avoid leaking these details to user space.
ALSO, with MAP_POPULATE as described by you this madvise flag doesn't make too much sense, because it will gets et after all memory already was allocated ...
NAK -- Thanks, David / dhildenb