Hi Christian, I looked into a few more things, and figured out why OpenChrome DRM was not booting X Server. Now the situation is under control in my side of the world (OpenChrome development world), and I got X Server working again with drm-next 5.10 code and OpenChrome DRM. Code will be committed into OpenChrome DRM upstream repository's drm-next-5.10 branch in the next few days. I can see that OpenChrome DRM "just happened to" work without ttm_tt_create/destroy callbacks being specified. I can fix this issue easily. The other issue I was facing was that commit 48e07c23cbeba2a2cda7ca73be0015e727818536 (drm/ttm: nuke memory type flags) discontinued TTM_PL_FLAG_* flags. This caused incompatibility between OpenChrome DDX and the updated OpenChrome DRM that incorporated the changes made with commit 48e07c2. OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM. I changed OpenChrome DDX code to use TTM_PL_* instead for allocating memory via TTM, and OpenChrome DRM based Linux kernel was able to boot X Server properly. Again, the code change to the DDX will happen in a few days. Before I moved on from this issue, I will like to ask for one request regarding TTM and its callbacks. Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks? Although I did not bring it up with you back then, I did get hit with an issue similar to this 3 years ago. https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4.13&id=3c08ec601bb1ccd5ff58a9101317b728aa7204bd https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4.13&id=2064175f977d9859831be653df16e3ea10415a8a I did not send you an e-mail about it, I do recall complaining to Alex Deucher about the above io_mem_pfn null pointer bug at XDC2017 (I did attend it, and I presented about the state of OpenChrome Project.). I brought this up with Alex because I stuck for more than 2 weeks to figure out the issue (the commit for overcoming io_mem_pfn callback issue was made on 2017-09-13, but the commit prior to that was on 2017-08-28). https://cgit.freedesktop.org/openchrome/drm-openchrome/log/?h=drm-next-4.13&ofs=50 Anyway, someone else got hit by the same issue a month or two later https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html https://lists.freedesktop.org/archives/dri-devel/2017-December/161168.html These are the commits that fixed the issue. https://cgit.freedesktop.org/drm/drm/commit/?id=ea642c3216cb2a60d1c0e760ae47ee85c9c16447 https://cgit.freedesktop.org/drm/drm/commit/?id=c67fa6edc8b11afe22c88a23963170bf5f151acf If you compare the commit date, I figured it out io_mem_pfn null pointer bug 2 to 3 months earlier. I think the problem we got with the latest code change to the TTM is, many TTM callbacks do not check for a null pointer for mandatory callbacks. Also, the ttm_bo_driver struct comment section inside ttm_bo_driver.h does not make it 100% clear that what is mandatory and what is not. If these were done, I am sure I would have left ttm_tt_create/destroy callbacks intact. https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-5.10&id=64947142a1cf8b83faa73da7aa35a17f6a24568a (scroll down to openchrome/openchrome_ttm.c) Regarding your suggestion that I should abandon OpenChrome DRM's current GEM / TTM implementation and instead I should use the GEM VRAM helpers, since I was able to figure out the issues with some suggestions from you and Dave, I do not think it is worth switching to GEM VRAM helpers at this point. Obviously, the current implementation does not support acceleration, hence, it appears to be a good candidate for switching to GEM VRAM helpers. While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers. Regards, Kevin Brace Brace Computer Laboratory blog https://bracecomputerlab.com > Sent: Monday, October 19, 2020 at 3:13 AM > From: "Christian König" <ckoenig.leichtzumerken@xxxxxxxxx> > To: "Kevin Brace" <kevinbrace@xxxxxxx>, "Dave Airlie" <airlied@xxxxxxxxx>, "dri-devel" <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > Subject: Re: It appears drm-next TTM cleanup broke something . . . > > Hi Kevin, > > the basic problem you are facing is that ttm_tt_create/destroy is > mandatory (It always was). You need an implementation or otherwise you > won't be able to use the system domain (additional to the optional GTT > domain). > > My best guess is that the difference is that we now force to initiate > the system domain for all drivers. > > If that is correct you just that you never ran into because you never > correctly initialized TTM to support buffer moves. > > I'm not sure what exactly the OpenChrome DRM driver is doing, but I > strongly suggest to just drop TTM support completely and use the GEM > VRAM helper layer instead. > > Regards, > Christian. > > Am 19.10.20 um 09:23 schrieb Kevin Brace: > > Hi Dave, > > > > Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null. > > It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from. > > I think making huge changes to TTM during this development cycle broke OpenChrome DRM. > > Following up on the question I raised during the previous e-mail. > > Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality? > > I feel like that should be the expected behavior. > > Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.). > > Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons. > > > > 1) nouveau, radeon, and amdgpu already use TTM TT functionality. > > 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage. > > 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now. > > > > > > Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members. > > That implementation worked fine until the previous development cycle code. > > Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point. > > > > Regards, > > > > Kevin Brace > > Brace Computer Laboratory blog > > https://bracecomputerlab.com > > > > > >> Sent: Sunday, October 18, 2020 at 12:50 PM > >> From: "Dave Airlie" <airlied@xxxxxxxxx> > >> To: "Kevin Brace" <kevinbrace@xxxxxxx>, "Christian König" <ckoenig.leichtzumerken@xxxxxxxxx> > >> Cc: "dri-devel" <dri-devel@xxxxxxxxxxxxxxxxxxxxx>, "Dave Airlie" <airlied@xxxxxxxxxx> > >> Subject: Re: It appears drm-next TTM cleanup broke something . . . > >> > >> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@xxxxxxx> wrote: > >>> Hi Dave, > >>> > >>> It is a little urgent, so I am writing this right now. > >>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago. > >>> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init(). > >>> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly. > >>> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access. > >>> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c. > >>> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null). > >> cc'ing Christian, > >> > >> I can't remember if we did this deliberately or if just worked by > >> accident previously. > >> > >> Either way, you should probably need a ttm_tt_create going forward. > >> > >> Dave. > >> > >>> _______________________________________________ > >>> . . . > >>> kernel: [ 34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create. > >>> kernel: [ 34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement. > >>> kernel: [ 34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement. > >>> kernel: [ 34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000 > >>> kernel: [ 34.310742] #PF: supervisor instruction fetch in kernel mode > >>> kernel: [ 34.310745] #PF: error_code(0x0010) - not-present page > >>> . . . > >>> kernel: [ 34.310807] Call Trace: > >>> kernel: [ 34.310827] ttm_tt_create+0x5f/0xa0 [ttm] > >>> kernel: [ 34.310839] ttm_bo_validate+0xb8/0x140 [ttm] > >>> kernel: [ 34.310886] ? drm_vma_offset_add+0x56/0x70 [drm] > >>> kernel: [ 34.310897] ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome] > >>> . . . > >>> _______________________________________________ > >>> > >>> The erroneous call to "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()). > >>> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code. > >>> It appears that Linux 5.10's drm-next code broke the code. > >>> > >>> Regards, > >>> > >>> Kevin Brace > >>> Brace Computer Laboratory blog > >>> https://bracecomputerlab.com > >>> > >>> _______________________________________________ > >>> dri-devel mailing list > >>> dri-devel@xxxxxxxxxxxxxxxxxxxxx > >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel