Re: It appears drm-next TTM cleanup broke something . . .

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kevin,

the basic problem you are facing is that ttm_tt_create/destroy is mandatory (It always was). You need an implementation or otherwise you won't be able to use the system domain (additional to the optional GTT domain).

My best guess is that the difference is that we now force to initiate the system domain for all drivers.

If that is correct you just that you never ran into because you never correctly initialized TTM to support buffer moves.

I'm not sure what exactly the OpenChrome DRM driver is doing, but I strongly suggest to just drop TTM support completely and use the GEM VRAM helper layer instead.

Regards,
Christian.

Am 19.10.20 um 09:23 schrieb Kevin Brace:
Hi Dave,

Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
     Following up on the question I raised during the previous e-mail.
Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
I feel like that should be the expected behavior.
Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.

1) nouveau, radeon, and amdgpu already use TTM TT functionality.
2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.


Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
That implementation worked fine until the previous development cycle code.
Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


Sent: Sunday, October 18, 2020 at 12:50 PM
From: "Dave Airlie" <airlied@xxxxxxxxx>
To: "Kevin Brace" <kevinbrace@xxxxxxx>, "Christian König" <ckoenig.leichtzumerken@xxxxxxxxx>
Cc: "dri-devel" <dri-devel@xxxxxxxxxxxxxxxxxxxxx>, "Dave Airlie" <airlied@xxxxxxxxxx>
Subject: Re: It appears drm-next TTM cleanup broke something . . .

On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@xxxxxxx> wrote:
Hi Dave,

It is a little urgent, so I am writing this right now.
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
cc'ing Christian,

I can't remember if we did this deliberately or if just worked by
accident previously.

Either way, you should probably need a ttm_tt_create going forward.

Dave.

_______________________________________________
. . .
kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
. . .
kernel: [   34.310807] Call Trace:
kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
. . .
_______________________________________________

The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
It appears that Linux 5.10's drm-next code broke the code.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux