System freeze: Debian Stable with C61 [GeForce 7025 / nForce 630a]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]<

 



Hi, this may be an old issue.
I copied at the bottom the last message of a previous solution for
this same machine.
Essentially I have random freezes (display image seems like a snow storm or just
gets frozen), nothing works, I'm forced to hard reset the machine.
Last time the solution was just to remove nouveau_dri.so, and for what
seems to be
a couple of years system worked rock-solid.
But a couple of days ago I did a system update (I'm on Debian Stable
Bullseye right
now) and apparently the problem reappeared but worse/different: now it
happens even
without nouveau_dri.so present on the system. (Meaning: I remove
nouveau_dri.so and
the freezes happen randomly anyway.)
The hardware is the same, so, I'm imaging maybe is a kernel issue?
This is the hardware:

$ lspci
00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1)
00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2)
00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2)
00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2)
00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3)
00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3)
00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1)
00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2)
00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2)
00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2)
00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce
7025 / nForce 630a] (rev a2)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Link Control

With:

$ uname -a
Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30)
x86_64 GNU/Linux

And:

$ apt-cache policy xserver-xorg-video-nouveau
xserver-xorg-video-nouveau:
  Instalados: 1:1.0.17-1
  Candidato:  1:1.0.17-1
  Tabla de versión:
 *** 1:1.0.17-1 500
        500 https://deb.debian.org/debian bullseye/main amd64 Packages
        100 /var/lib/dpkg/status

And this seems to be all the info I have:

$ sudo journalctl -S 2021-10-21 -x -p 4 | grep nouveau
oct 21 13:16:43 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 21 13:16:43 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 13:17:00 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 17:12:05 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 21 17:12:05 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 17:12:18 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 17:22:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 014b0001 FAULT at 00b010
oct 21 17:22:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 014b0001 FAULT at 00b010
oct 21 21:32:55 debian kernel:  autofs4 ext4 crc16 mbcache jbd2
crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic
crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper
cec ttm ata_generic sata_nv drm libata scsi_mod psmouse serio_raw
evdev button
oct 21 22:22:53 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01f20001 FAULT at 00b020
oct 21 22:23:14 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b020
oct 21 22:23:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01f20001 FAULT at 00b020
oct 21 22:25:32 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 21 22:25:32 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 22:26:03 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 22:46:40 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 21 22:46:40 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 22:46:51 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 22:59:55 debian kernel:  crct10dif_common nouveau video mxm_wmi
wmi i2c_algo_bit drm_kms_helper ata_generic sata_nv cec libata ttm drm
scsi_mod psmouse evdev serio_raw button
oct 21 23:44:26 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01570001 FAULT at 00b010
oct 21 23:44:26 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01ef0001 FAULT at 00b020
oct 21 23:45:08 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02550001 FAULT at 00b030
oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01440001 FAULT at 00b030
oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02610001 FAULT at 00b030
oct 21 23:45:09 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 015e0001 FAULT at 00b030
oct 21 23:45:10 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02610001 FAULT at 00b030
oct 21 23:45:11 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01440001 FAULT at 00b030
oct 21 23:45:13 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02550001 FAULT at 00b030
oct 21 23:45:13 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01440001 FAULT at 00b030
oct 21 23:45:16 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02610001 FAULT at 00b040
oct 21 23:45:17 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02550001 FAULT at 00b040
oct 21 23:45:17 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b030
oct 21 23:45:24 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b040
oct 21 23:48:53 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 21 23:48:53 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 23:49:21 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 22 00:28:54 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 22 00:28:54 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 22 00:29:04 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 23 14:48:33 debian kernel:  parport_pc ppdev lp parport fuse
configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2
crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic
crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper
cec ttm drm sata_nv ata_generic psmouse libata serio_raw scsi_mod
evdev button
oct 24 03:18:13 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 24 03:18:13 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 24 03:18:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 24 11:08:30 debian kernel: nouveau 0000:00:0d.0: DRM: DCB type 4 not known
oct 24 11:08:30 debian kernel: nouveau 0000:00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 24 11:09:04 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 24 11:12:01 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01740001 FAULT at 00b010
oct 24 11:12:02 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 018b0001 FAULT at 00b020
oct 24 11:12:18 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02150001 FAULT at 00b030
oct 24 11:12:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b030
oct 24 11:12:23 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02910001 FAULT at 00b030
oct 24 11:23:33 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b030
oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b010
oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b020
oct 24 12:33:35 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 24 12:33:45 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 014a0001 FAULT at 00b010
oct 24 12:33:45 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 018b0001 FAULT at 00b020
oct 24 13:09:37 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02290001 FAULT at 00b030
oct 24 13:09:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01710001 FAULT at 00b040
oct 24 13:09:38 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b030
oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b040
oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 02870001 FAULT at 00b030
oct 24 13:09:39 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 01570001 FAULT at 00b030
oct 24 13:09:42 debian kernel: nouveau 0000:00:0d.0: bus: MMIO write
of 00000000 FAULT at 00b030

Sorry for not trimming anything, not sure what's useful and what's not.

Any hint?

As previously, thanks A LOT in advance.

Best regards!


On 1/29/20, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote:
> On Wed, Jan 29, 2020 at 5:03 AM riveravaldez <riveravaldezmail@xxxxxxxxx>
> wrote:
>>
>> On 12/11/18, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote:
>> > On Tue, Dec 11, 2018 at 11:16 AM riveravaldez
>> > <riveravaldezmail@xxxxxxxxx> wrote:
>> >
>> >> The freezes appears randomly, in every situation, and not when I
>> >> launch some 3D applications or anything similar.
>> >
>> > Try removing nouveau_dri.so -- that will ensure no 3d accel is used,
>> > while keeping your 2d accel provided by the nouveau ddx.
>>
>> Sorry if it's wrong to continue this old thread, but after a good
>> amount of testing (+1 year) I can confirm that both the problem and
>> the solution where the mentioned ones.
>>
>> The problem (random full-system freezes) persists without change,
>> identical. And removing nouveau_dri.so from
>> /usr/lib/x86_64-linux-gnu/dri/ effectively fixes it completely
>> (leaving aside any lost of performance and some warning messages in
>> system upgrades and programs launching[1]).
>>
>> So, after a GREAT thank-you to Ilia, I consult:
>>
>> 1. Is this something that could be fixed? Can I do anything to help?
>>
>> 2. If the only possible/viable solution is the mentioned one (remove
>> nouveau_dri.so), which would be the proper way to make it permanent?
>>
>> 2'. In many dist-upgrades the nouveau_dri.so file is re-created in the
>> same folder, what would be a clean/neat way to handle this?
>>
>> Thanks A LOT again.
>>
>> [1] A lot of lines like these on some dist-upgrades:
>>
>> W: Possible missing firmware
>> /lib/firmware/nvidia/gp100/gr/sw_method_init.bin for module nouveau
>> W: Possible missing firmware
>> /lib/firmware/nvidia/gp100/gr/sw_bundle_init.bin for module nouveau
>> W: Possible missing firmware
>> /lib/firmware/nvidia/gp100/gr/sw_nonctx.bin for module nouveau
>> (...)
>
> Sounds like your initramfs builder tries to include these but they're
> not available on your filesystem. As long as you're not plugging a
> Pascal GPU into your system, you're fine.
>
>>
>> And a lot of programs producing messages like these on start:
>>
>> libGL error: unable to load driver: nouveau_dri.so
>> libGL error: driver pointer missing
>> libGL error: failed to load driver: nouveau
>
> Hmmmm annoying. I hadn't considered that. I could add an option to the
> DDX which makes the default driver "swrast" or something. I also
> wonder if just not loading the "glx" and "dri2" X modules would be
> sufficient to get rid of these.
>
> You can also stick LIBGL_ALWAYS_SOFTWARE=1 into your /etc/environment
> (or whatever location causes that env var to appear everywhere) which
> will force it to use swrast. (With the added benefit of being able to
> unset it for the programs where you really do want 3d accel.)
>
> As for a more permanent fix, one could invest developer attention to
> the nv30 gallium driver, but that one would first have to be located.
> I'd be happy to provide some limited mentoring in such a case.
>
> Cheers,
>
>   -ilia
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux