On 4/21/20 11:43 AM, Gerd Hoffmann wrote: > On Sat, Apr 18, 2020 at 02:39:17PM +0800, Caicai wrote: >> When a qxl resource is released, the list that needs to be released is >> fetched from the linked list ring and cleared. When you empty the list, >> instead of trying to determine whether the ttm buffer object for each >> qxl in the list is locked, you release the qxl object and remove the >> element from the list until the list is empty. It was found that the >> linked list was cleared first, and that the lock on the corresponding >> ttm Bo for the QXL had not been released, so that the new qxl could not >> be locked when it used the TTM. > > So the dma_resv_reserve_shared() call in qxl_release_validate_bo() is > unbalanced? Because the dma_resv_unlock() call in > qxl_release_fence_buffer_objects() never happens due to > qxl_release_free_list() clearing the list beforehand? Is that correct? we observe similar issue: RHEL7 guests crashes in qxl_draw_opaque_fb() qxl_release_fence_buffer_objects() crashdump investigation shows that qlx_object was freed and reused, so its original content was re-written. At the same time qxl_device have empty release_idr, ant there are no allocated qxl_bo_list entries. i.e. qxl_release_free was really called. > The only way I see for this to happen is that the guest is preempted > between qxl_push_{cursor,command}_ring_release() and > qxl_release_fence_buffer_objects() calls. The host can complete the qxl > command then, signal the guest, and the IRQ handler calls > qxl_release_free_list() before qxl_release_fence_buffer_objects() runs. We think the same: qxl_release was freed by garbage collector before original thread had called qxl_release_fence_buffer_objects(). > Looking through the code I think it should be safe to simply swap the > qxl_release_fence_buffer_objects() + > qxl_push_{cursor,command}_ring_release() calls to close that race > window. Can you try that and see if it fixes the bug for you? I'm going to prepare and test such patch but I have one question here: qxl_push_*_ring_release can be called with interruptible=true and fail How to correctly handle this case? Is the hunk below correct from your POV? --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -261,12 +261,8 @@ static int qxl_process_single_command(struct qxl_device *qdev, apply_surf_reloc(qdev, &reloc_info[i]); } + qxl_release_fence_buffer_objects(release); ret = qxl_push_command_ring_release(qdev, release, cmd->type, true); - if (ret) - qxl_release_backoff_reserve_list(release); <<<< ???? - else - qxl_release_fence_buffer_objects(release); - out_free_bos: out_free_release: Thank you, Vasily Averin _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel