Re: Oops in qxl_bo_move_notify()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, that's an already known issue.

When the allocation fails bo->resource might be NULL now and we need to add checks for that corner case as well.

Christian.

Am 08.07.21 um 12:14 schrieb Daniel Vetter:
On Wed, Jul 07, 2021 at 04:36:49PM +0000, Roberto Sassu wrote:
Hi

I'm getting this oops (on commit a180bd1d7e16):

     [   17.711520] BUG: kernel NULL pointer dereference, address: 0000000000000010
     [   17.739451] RIP: 0010:qxl_bo_move_notify+0x35/0x80 [qxl]
     [   17.827345] RSP: 0018:ffffc90000457c08 EFLAGS: 00010286
     [   17.827350] RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
     [   17.827353] RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffffffff85596feb
     [   17.827356] RBP: ffff88800e311c00 R08: 0000000000000000 R09: 0000000000000000
     [   17.827358] R10: ffffffff8697b243 R11: fffffbfff0d2f648 R12: 0000000000000000
     [   17.827361] R13: ffff88800e311e48 R14: ffff88800e311e98 R15: ffff88800e311e90
     [   17.827364] FS:  0000000000000000(0000) GS:ffff88805d800000(0000) knlGS:0000000000000000
     [   17.861699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [   17.861703] CR2: 0000000000000010 CR3: 000000002642c000 CR4: 0000000000350ee0
     [   17.861707] Call Trace:
     [   17.861712]  ttm_bo_cleanup_memtype_use+0x4d/0xb0 [ttm]
     [   17.861730]  ttm_bo_release+0x42d/0x7c0 [ttm]
     [   17.861746]  ? ttm_bo_cleanup_refs+0x127/0x420 [ttm]
     [   17.888300]  ttm_bo_delayed_delete+0x289/0x390 [ttm]
     [   17.888317]  ? ttm_bo_cleanup_refs+0x420/0x420 [ttm]
     [   17.888332]  ? lock_release+0x9c/0x5c0
     [   17.901033]  ? rcu_read_lock_held_common+0x1a/0x50
     [   17.905183]  ttm_device_delayed_workqueue+0x18/0x50 [ttm]
     [   17.909371]  process_one_work+0x537/0x9f0
     [   17.913345]  ? pwq_dec_nr_in_flight+0x160/0x160
     [   17.917297]  ? lock_acquired+0xa4/0x580
     [   17.921168]  ? worker_thread+0x169/0x600
     [   17.925034]  worker_thread+0x7a/0x600
     [   17.928657]  ? process_one_work+0x9f0/0x9f0
     [   17.932360]  kthread+0x200/0x230
     [   17.935930]  ? set_kthread_struct+0x80/0x80
     [   17.939593]  ret_from_fork+0x22/0x30
     [   17.951737] CR2: 0000000000000010
     [   17.955496] ---[ end trace e30cc21c24e81ee5 ]---

I had a look at the code, and it seems that this is caused by
trying to use bo->resource which is NULL.

bo->resource is freed by ttm_bo_cleanup_refs() ->
ttm_bo_cleanup_memtype_use() -> ttm_resource_free().

And then a notification is issued by ttm_bo_cleanup_refs() ->
ttm_bo_put() -> ttm_bo_release() ->
ttm_bo_cleanup_memtype_use(), this time with bo->release
equal to NULL.

I was thinking a proper way to fix this. Checking that
bo->release is not NULL in qxl_bo_move_notify() would
solve the issue. But maybe there is a better way, like
avoiding that ttm_bo_cleanup_memtype_use() is called
twice. Which way would be preferable?
Adding Christian and Dave, who've touched all this recently iirc.
-Daniel

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Li Jian, Shi Yanli




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux