RE: [PATCH] udmabuf: revert 'Add support for mapping hugepages (v4)'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mike,

Sorry for the late reply; I just got back from vacation.
If it is unsafe to directly use the subpages of a hugetlb page, then reverting
this patch seems like the only option for addressing this issue immediately.
So, this patch is
Acked-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>

As far as the use-case is concerned, there are two main users of the udmabuf
driver: Qemu and CrosVM VMMs. However, it appears Qemu is the only one
that uses hugetlb pages (when hugetlb=on is set) as the backing store for
Guest (Linux, Android and Windows) system memory. The main goal is to
share the pages associated with the Guest allocated framebuffer (FB) with
the Host GPU driver and other components in a zero-copy way. To that end,
the guest GPU driver (virtio-gpu) allocates 4k size pages (associated with
the FB) and pins them before sharing the (guest) physical (or dma) addresses
(and lengths) with Qemu. Qemu then translates the addresses into file
offsets and shares these offsets with udmabuf. 

The udmabuf driver obtains the pages associated with the file offsets and
uses these pages to eventually populate a scatterlist. It also creates a 
dmabuf fd and acts as the exporter. AFAIK, it should be possible to populate
the scatterlist with physical/dma addresses (of huge pages) instead of using
subpages but this might limit the capabilities of some (dmabuf) importers.
I'll try to figure out a solution using physical/dma addresses and see if it
works as expected, and will share the patch on linux-mm to request
feedback once it is ready.

Thanks,
Vivek

> 
> This effectively reverts commit 16c243e99d33 ("udmabuf: Add support
> for mapping hugepages (v4)").  Recently, Junxiao Chang found a BUG
> with page map counting as described here [1].  This issue pointed out
> that the udmabuf driver was making direct use of subpages of hugetlb
> pages.  This is not a good idea, and no other mm code attempts such use.
> In addition to the mapcount issue, this also causes issues with hugetlb
> vmemmap optimization and page poisoning.
> 
> For now, remove hugetlb support.
> 
> If udmabuf wants to be used on hugetlb mappings, it should be changed to
> only use complete hugetlb pages.  This will require different alignment
> and size requirements on the UDMABUF_CREATE API.
> 
> [1] https://lore.kernel.org/linux-mm/20230512072036.1027784-1-
> junxiao.chang@xxxxxxxxx/
> 
> Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
>  drivers/dma-buf/udmabuf.c | 47 +++++----------------------------------
>  1 file changed, 6 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 01f2e86f3f7c..12cf6bb2e3ce 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -12,7 +12,6 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
>  #include <linux/udmabuf.h>
> -#include <linux/hugetlb.h>
>  #include <linux/vmalloc.h>
>  #include <linux/iosys-map.h>
> 
> @@ -207,9 +206,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	struct udmabuf *ubuf;
>  	struct dma_buf *buf;
>  	pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit;
> -	struct page *page, *hpage = NULL;
> -	pgoff_t subpgoff, maxsubpgs;
> -	struct hstate *hpstate;
> +	struct page *page;
>  	int seals, ret = -EINVAL;
>  	u32 i, flags;
> 
> @@ -245,7 +242,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  		if (!memfd)
>  			goto err;
>  		mapping = memfd->f_mapping;
> -		if (!shmem_mapping(mapping) &&
> !is_file_hugepages(memfd))
> +		if (!shmem_mapping(mapping))
>  			goto err;
>  		seals = memfd_fcntl(memfd, F_GET_SEALS, 0);
>  		if (seals == -EINVAL)
> @@ -256,48 +253,16 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			goto err;
>  		pgoff = list[i].offset >> PAGE_SHIFT;
>  		pgcnt = list[i].size   >> PAGE_SHIFT;
> -		if (is_file_hugepages(memfd)) {
> -			hpstate = hstate_file(memfd);
> -			pgoff = list[i].offset >> huge_page_shift(hpstate);
> -			subpgoff = (list[i].offset &
> -				    ~huge_page_mask(hpstate)) >>
> PAGE_SHIFT;
> -			maxsubpgs = huge_page_size(hpstate) >>
> PAGE_SHIFT;
> -		}
>  		for (pgidx = 0; pgidx < pgcnt; pgidx++) {
> -			if (is_file_hugepages(memfd)) {
> -				if (!hpage) {
> -					hpage =
> find_get_page_flags(mapping, pgoff,
> -
> FGP_ACCESSED);
> -					if (!hpage) {
> -						ret = -EINVAL;
> -						goto err;
> -					}
> -				}
> -				page = hpage + subpgoff;
> -				get_page(page);
> -				subpgoff++;
> -				if (subpgoff == maxsubpgs) {
> -					put_page(hpage);
> -					hpage = NULL;
> -					subpgoff = 0;
> -					pgoff++;
> -				}
> -			} else {
> -				page =
> shmem_read_mapping_page(mapping,
> -							       pgoff + pgidx);
> -				if (IS_ERR(page)) {
> -					ret = PTR_ERR(page);
> -					goto err;
> -				}
> +			page = shmem_read_mapping_page(mapping, pgoff
> + pgidx);
> +			if (IS_ERR(page)) {
> +				ret = PTR_ERR(page);
> +				goto err;
>  			}
>  			ubuf->pages[pgbuf++] = page;
>  		}
>  		fput(memfd);
>  		memfd = NULL;
> -		if (hpage) {
> -			put_page(hpage);
> -			hpage = NULL;
> -		}
>  	}
> 
>  	exp_info.ops  = &udmabuf_ops;
> --
> 2.40.1






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux