Quoting Lepton Wu (2020-07-07 05:21:00) > For pages which are allocated in ttm with transparent huge pages, > tail pages have zero as reference count. The current vgem code use > get_page on them and it will trigger BUG when release_pages get called > on those pages later. > > Here I attach the minimal code to trigger this bug on a qemu VM which > enables virtio gpu (card1) and vgem (card0). BTW, since the upstream > virtio gpu has changed to use drm gem and moved away from ttm. So we > have to use an old kernel like 5.4 to reproduce it. But I guess > same thing could happen for a real GPU if the real GPU use similar code > path to allocate pages from ttm. I am not sure about two things: first, do we > need to fix this? will a real GPU hit this code path? Second, suppose this > need to be fixed, should this be fixed in ttm side or vgem side? If we remove > "huge_flags &= ~__GFP_COMP" from ttm_get_pages, then get_page in vgem works > fine. But it's there for fix another bug: > https://bugs.freedesktop.org/show_bug.cgi?id=103138 > It could also be "fixed" with this patch. But I am really not sure if this is > the way to go. > > Here is the code to reproduce this bug: > > unsigned int WIDTH = 1024; > unsigned int HEIGHT = 513; > unsigned int size = WIDTH * HEIGHT * 4; > > int work(int vfd, int dfd, int handle) { > int ret; > struct drm_prime_handle hf = {.handle = handle }; > ret = ioctl(dfd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &hf); > fprintf(stderr, "fd is %d\n", hf.fd); > hf.flags = DRM_CLOEXEC | DRM_RDWR; > ret = ioctl(vfd, DRM_IOCTL_PRIME_FD_TO_HANDLE, &hf); > fprintf(stderr, "new handle is %d\n", hf.handle); > struct drm_mode_map_dumb map = {.handle = hf.handle }; > ret = ioctl(vfd, DRM_IOCTL_MODE_MAP_DUMB, &map); > fprintf(stderr, "need map at offset %lld\n", map.offset); > unsigned char * ptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, vfd, > map.offset); > memset(ptr, 2, size); > munmap(ptr, size); > } > > int main() > { > int vfd = open("/dev/dri/card0", O_RDWR); // vgem > int dfd = open("/dev/dri/card1", O_RDWR); // virtio gpu > > int ret; > struct drm_mode_create_dumb ct = {}; > > ct.height = HEIGHT; > ct.width = WIDTH; > ct.bpp = 32; > ret = ioctl(dfd, DRM_IOCTL_MODE_CREATE_DUMB, &ct); > work(vfd, dfd, ct.handle); > fprintf(stderr, "done\n"); > } > > Signed-off-by: Lepton Wu <ytht.net@xxxxxxxxx> > --- > drivers/gpu/drm/vgem/vgem_drv.c | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c > index ec1a8ebb6f1b..be3d97e29804 100644 > --- a/drivers/gpu/drm/vgem/vgem_drv.c > +++ b/drivers/gpu/drm/vgem/vgem_drv.c > @@ -87,9 +87,8 @@ static vm_fault_t vgem_gem_fault(struct vm_fault *vmf) > > mutex_lock(&obj->pages_lock); > if (obj->pages) { > - get_page(obj->pages[page_offset]); > - vmf->page = obj->pages[page_offset]; > - ret = 0; > + ret = vmf_insert_pfn(vmf->vma, vmf->address, > + page_to_pfn(obj->pages[page_offset])); The danger here is that you store a pointer to a physical page in the CPU page tables and never remove it should the vgem bo be unpinned and the obj->pages[] released [and the physical pages returned to the system for reuse]. So judicial adding of drm_vma_node_unmap(&bo->base.vma_node, bo->base.anon_inode->i_mapping); drm_gem_put_pages, and also the kvfree in vgem_gem_free_object is suspect (for not being a drm_gem_put_pages). -Chris _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel