> On Nov 11, 2019, at 9:57 AM, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: > > Hi John > > Am 08.11.19 um 19:07 schrieb John Donnelly: >> >> >>> On Nov 8, 2019, at 9:06 AM, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: >>> >>> Hi >>> >>> Am 08.11.19 um 13:55 schrieb John Donnelly: >>>> >>>> >>>>> On Nov 8, 2019, at 1:46 AM, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: >>>>> >>>>> Hi John >>>>> >>>>> Am 07.11.19 um 23:14 schrieb John Donnelly: >>>>>> >>>>>> >>>>>>> On Nov 7, 2019, at 10:13 AM, John Donnelly <john.p.donnelly@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 7, 2019, at 7:42 AM, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi John >>>>>>>> >>>>>>>> Am 07.11.19 um 14:12 schrieb John Donnelly: >>>>>>>>> Hi Thomas ; Thank you for reaching out. >>>>>>>>> >>>>>>>>> See inline: >>>>>>>>> >>>>>>>>>> On Nov 7, 2019, at 1:54 AM, Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> apparently the vgaarb was not the problem. >>>>>>>>>> >>>>>>>>>> Am 07.11.19 um 03:29 schrieb John Donnelly: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am investigating an issue where we lose video activity when the display is switched from from “text mode” to “graphic mode” >>>>>>>>>>> on a number of servers using this driver. Specifically starting the GNOME desktop. >>>>>>>>>> >>>>>>>>>> When you say "text mode", do you mean VGA text mode or the graphical >>>>>>>>>> console that emulates text mode? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I call “text mode” the 24x80 ascii mode ; - NOT GRAPHICS . Ie : run-level 3; So I guess your term for it is VGA. >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> When you enable graphics mode, does it set the correct resolution? A lot >>>>>>>>>> of work went into memory management recently. I could imagine that the >>>>>>>>>> driver sets the correct resolution, but then fails to display the >>>>>>>>>> correct framebuffer. >>>>>>>>> >>>>>>>>> There is no display at all ; so there is no resolution to mention. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> If possible, could you try to update to the latest drm-tip and attach >>>>>>>>>> the output of >>>>>>>>>> >>>>>>>>>> /sys/kernel/debug/dri/0/vram-mm >>>>>>>>> >>>>>>>>> I don’t see that file ; Is there something else I need to do ? >>>>>>>> >>>>>>>> That file is fairly new and maybe it's not in the mainline kernel yet. >>>>>>>> See below for how to get it. >>>>>>> >>>>>>> I built your “tip” ; Still no graphics displayed . >>>>>>> >>>>>>> >>>>>>> mount -t debugfs none /sys/kernel >>>>>>> >>>>>>> cat /proc/cmdline >>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.4.0-rc6.drm.+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff >>>>>>> >>>>>>> >>>>>>> cat /sys/kernel/dri/0/vram-mm >>>>>>> >>>>>>> In VGA mode : >>>>>>> >>>>>>> >>>>>>> cat /sys/kernel/dri/0/vram-mm >>>>>>> 0x0000000000000000-0x0000000000000300: 768: used >>>>>>> 0x0000000000000300-0x0000000000000600: 768: used >>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free >>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used >>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used >>>>>>> >>>>>>> >>>>>>> In GRAPHICS mode ( if it matters ) >>>>>>> >>>>>>> >>>>>>> cat /sys/kernel/dri/0/vram-mm >>>>>>> 0x0000000000000000-0x0000000000000300: 768: used >>>>>>> 0x0000000000000300-0x0000000000000600: 768: used >>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free >>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used >>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used >>>>>>> total: 2032, used 1538 free 494 >>>>>>> >>>>> >>>>> This is interesting. In the graphics mode, you see two buffers of 768 >>>>> pages each. That's the main framebuffers as used by X (it's double >>>>> buffered). Then there's a free area and finally two pages for cursor >>>>> images (also double buffered). That looks as expected. >>>>> >>>>> The thing is that in text mode, the areas are allocated. But the driver >>>>> shouldn't be active, so the file shouldn't exist or only show a single >>>>> free area. >>>>> >>>> >>>> If you want me to double check this I will . I have GNOME installed , but the machine boots to runlevel 3, then I start the desktop using init 5 I am pretty sure I took that output when the machine was in graphic’s mode at runlevel 5 . >>>> >>>> >>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> I’ve attached : var/lib/gdm/.local/share/xorg/Xorg.0.log. ; instead ; >>>>>>>> >>>>>>>> Good! Looking through that log file, the card is found at line 79 and >>>>>>>> the generic X modesetting driver initializes below. That works as expected. >>>>>>>> >>>>>>>> I notices that several operations are not permitted (lines 78 and 87). I >>>>>>>> guess you're starting X from a regular user account? IIRC special >>>>>>>> permission is required to acquire control of the display. What happens >>>>>>>> if you start X as root user? >>>>>>> >>>>>>> >>>>>>> I am starting GNOME as root by doing “init 5” from either the console session or from ssh . >>>>>>> >>>>>>> The default runlevel is 3 on boot . >>>>>>> >>>>>>> On failing session running your 5.4.0.rc6. >>>>>>> >>>>>>> 78 [ 237.712] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted) >>>>>>> >>>>>>> 87 [ 237.712] (EE) open /dev/fb0: Permission denied >>>>>>> >>>>>>> Booting 4.18 kernel yields the same error results in: /var/lib/gdm/.local/share/xorg/Xorg.0.log >>>>>>> >>>>>>> 78 [ 101.334] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted) >>>>>>> >>>>>>> 87 [ 101.334] (EE) open /dev/fb0: Permission denied >>>>>>> >>>>>>> >>>>>>> What is strange the X logs ( bad and Ok ) files essentially appear as if GNOME started ! >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> <Xorg.0.log.bad><Xorg.0.log.Ok> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is my cmdline - I just tested 5.3.0 and it fails too ( my last test was 5.3.8 and it failed also ) . >>>>>>>>> >>>>>>>>> # cat /proc/cmdline >>>>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.0+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff >>>>>>>>> >>>>>>>>> When you say “tip”. - Are you referring to a specific kernel ? I can build a 5.4.0.rc6 ; The problem appears to have been introduced around 5.3 time frame. >>>>>>>> >>>>>>>> The latest and greatest DRM code is in the drm-tip branch at >>>>>>>> >>>>>>>> git://anongit.freedesktop.org/drm/drm-tip >>>>>>>> >>>>>>>> If you build this version you should find >>>>>>>> >>>>>>>> /sys/kernel/debug/dri/0/vram-mm >>>>>>>> >>>>>>>> on the device. You have to build with debugfs enabled and >>>>>>>> maybe have to mount debugfs at /sys/kernel/debug. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> before and after switching to graphics mode. The file lists the >>>>>>>>>> allocated regions of the VRAM. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This adapter is Server Engines Integrated Remote Video Acceleration Subsystem (RVAS) and is used as remote console in iLO/DRAC environments. >>>>>>>>>>> >>>>>>>>>>> I don’t see any specific errors in the gdm logs or message file other than this: >>>>>>>>>> >>>>>>>>>> You can boot with drm.debug=0xff on the kernel command line to enable >>>>>>>>>> more warnings. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Could you please attach the output of lspci -v for the VGA adapter? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is the output from the current machine; The previous addresses were from another model using the same SE device: >>>>>>>>> >>>>>>>>> >>>>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc5000000 -> 0xc5ffffff >>>>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 1: 0xc6810000 -> 0xc6813fff >>>>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xc6000000 -> 0xc67fffff >>>>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: vgaarb: deactivate vga console >>>>>>>>> >>>>>>>>> >>>>>>>>> lspci -s 3d:00.0 -vvv -k >>>>>>>>> 3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller]) >>>>>>>>> Subsystem: Oracle/SUN Device 4852 >>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- >>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >>>>>>>>> Latency: 0, Cache Line Size: 64 bytes >>>>>>>>> Interrupt: pin A routed to IRQ 16 >>>>>>>>> NUMA node: 0 >>>>>>>>> Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M] >>>>>>>>> Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K] >>>>>>>>> Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M] >>>>>>>>> Expansion ROM at 000c0000 [disabled] [size=128K] >>>>>>>>> Capabilities: [dc] Power Management version 2 >>>>>>>>> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) >>>>>>>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>>>>>>>> Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00 >>>>>>>>> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us >>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- >>>>>>>>> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- >>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- >>>>>>>>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>>>>>>>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- >>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns >>>>>>>>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- >>>>>>>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ >>>>>>>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>>>>>>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >>>>>>>>> Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit- >>>>>>>>> Address: 00000000 Data: 0000 >>>>>>>>> Kernel driver in use: mgag200 >>>>>>>>> Kernel modules: mgag200 >>>>>>>> >>>>>>>> Looks all normal. >>>>>>>> >>>>>>>> Best regards >>>>>>>> Thomas >>>>>>>> >>>>>> >>>>>> ============== Snip =========== >>>>>> >>>>>> >>>>>> Hi Thomas >>>>>> , >>>>>> I hopefully narrowed down the breakage between these up-stream commits, which is v5.2 and 5.3.0-rc1: >>>>>> >>>>>> >>>>>> between : 0ecfebd2b524 2019-07-07 | Linux 5.2 to : 5f9e832c1370 2019-07-21 | Linus 5.3-rc1 >>>>>> >>>>>> >>>>>> I started to bisect this range on by date, by day , based on the changes done in : >>>>>> >>>>>> drivers/gpu/drm/ >>>>>> >>>>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ; works >>>>>> >>>>>> Hopefully something in drivers/gpu/drm/ between the date range of 2019-07-14 to 2019-07-21 will surface tomorrow. >>>>> >>>>> Great, thanks for bisecting. >>>>> >>>>> Could you attach your kernel config file? I'd like to compare with my >>>>> config and try to reproduce the issue. >>>>> >>>>> Best regards >>>>> Thomas >>>> >>>> Hi. >>>> >>>> Here are config files generated after a “ make oldconfig “ that started with an original .config file from a master file we use for 5.4.0.-rc4. : >>>> >>>> config.5.2.21 - work with that flavor >>>> config.5.3. fails with 5.3 and later. >>>> >>>> Do you have access to mgag200 style adapter ? >>> >>> I do. >>> >>> I think I've been able to reproduce the issue. Buffers seem to remain in >>> video ram after they have been pinned there. I'll investigate next week. >>> I hope your bisecting session can point to the cause. >>> >>> Best regards >>> Thomas >> >> Hi Thomas, >> >> >> Wonderful! >> >> I think I have narrowed down the merge to this build which is : vmlinuz-5.2.0-rc5+ : >> >> >> be8454afc50f 2019-07-15 | Merge tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm >> >> Specifically this merge included these two changes : >> >> 94dc57b10399 2019-06-13 | drm/mgag200: Rewrite cursor handling >> f4ce5af71bc2 2019-06-13 | drm/mgag200: Pin framebuffer BO during dirty update >> >> >> I tried reverting them and the resultant driver doesn’t build afterwards due to drm calls. >> >> If I build a kernel from : >> >> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma >> >> That is posted day prior to be8454afc50f - the GNOME desktop works. > > I thought I could reproduce the problem, but I'm not so sure now. > > Please bisect the range between the two merges as described by Daniel to > find the broken commit. Doing > > git bisect start > git bisect bad be8454afc50f > git bisect good fec88ab0af97 > > should start the session. In short . I started with : git bisect start git bisect bad be8454afc50f git bisect good fec88ab0af97 And at the the end of bisects showed this was the offending commit : c0a74c732568 commit c0a74c732568ad347f7b3de281922808dab30504 (refs/bisect/bad) Author: Jani Nikula <jani.nikula@xxxxxxxxx> Date: Fri May 24 20:35:22 2019 +0300 drm/i915: Update DRIVER_DATE to 20190524 Signed-off-by: Jani Nikula <jani.nikula@xxxxxxxxx> That does not have any real relevance I am not sure if I did the bisects correctly . After each test I did : #1 git bisect bad 827440a90146 #2 git bisect bad f5b07b04e5f0 #3 git bisect bad c0a74c732568 #4 git bisect good 818f5cb3e8fb #5 git bisect good 6cfe7ec02e85 #6 git bisect good f71e01a78bee #7 git bisect good 09a93ef3d60f #8 git bisect good f1e6b336bafa #9 git bisect good eaf20e6933dc #10 git bisect good 63e8dcdb4f8e #11 git bisect good 397049a03022 I’ve restarted the bisect without appending the <commit-id> after a the “bad|good “ , and so far git is showing the same selections. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel