Hi > -----Original Message----- > From: Karol Herbst <kherbst@xxxxxxxxxx> > Sent: Monday, November 7, 2022 3:42 AM > To: Timothy Madden <terminatorul@xxxxxxxxx> > Cc: nouveau@xxxxxxxxxxxxxxxxxxxxx; Ajay Gupta <ajayg@xxxxxxxxxx> > Subject: Re: Fans ramping up randomly when idle > > External email: Use caution opening links or attachments > > > On Sat, Nov 5, 2022 at 8:36 PM Timothy Madden <terminatorul@xxxxxxxxx> > wrote: > > > > Hello > > > > My Msi Gaming X Trio 2080 Ti randomly ramps up the fans with no way to > > recover (I have to reboot) even when the card is idle or is only showing the > desktop. > > > > This issue happens even when the card is not connected to a monitor. > > > > My dmesg output from nouveau is included below, I think the last 2 > > lines are the relevant ones: > > [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to change power state > > from D3hot to D0, device inaccessible [ 9427.889387] nvidia-gpu > > 0000:0b:00.3: i2c timeout error ffffffff This only implies that there is no usb/ucsi device on the card, it is expected from such cards and should be seen in dmesg even when heating issue is not there. Thanks >nvpublic > > > > > > that's kind of odd, because "nvidia-gpu" implies you might have multiple > drivers here? Though .3 should be some USB/UCSI or something related sub > device on the GPU and Nvidia might have messed it up (adding the > maintainer of the i2c-nvidia-gpu driver on CC). > > Anyway, the fans are probably controlled by the Laptops firmware and > maybe something goes wrong with the runtime power management feature > here, which as far as I can tell works on the Nouveau side, but i2c-nvidia-gpu > might prevent the GPU from powering done and so causing more heat. It's > also interesting that the GPU runs that hot, but given we don't support > changing power states yet in Nouveau (still WIP wiring up the new released > firmware from nvidia), not much we can do while the GPU is actually in use at > this point. > > > > > > > timothy@localhost:~> dmesg | grep -i -e nouveau -e nvidia > > [ 6.511064] nouveau 0000:0b:00.0: NVIDIA TU102 (162000a1) > > [ 6.594464] nouveau 0000:0b:00.0: bios: version 90.02.42.00.14 > > [ 6.597756] nouveau 0000:0b:00.0: pmu: firmware unavailable > > [ 6.601947] nouveau 0000:0b:00.0: fb: 11264 MiB GDDR6 > > [ 6.618463] nouveau 0000:0b:00.0: DRM: VRAM: 11264 MiB > > [ 6.618465] nouveau 0000:0b:00.0: DRM: GART: 536870912 MiB > > [ 6.618466] nouveau 0000:0b:00.0: DRM: BIT table 'A' not found > > [ 6.618468] nouveau 0000:0b:00.0: DRM: BIT table 'L' not found > > [ 6.618469] nouveau 0000:0b:00.0: DRM: TMDS table version 2.0 > > [ 6.618470] nouveau 0000:0b:00.0: DRM: DCB version 4.1 > > [ 6.618471] nouveau 0000:0b:00.0: DRM: DCB outp 00: 02800f66 04600020 > > [ 6.618473] nouveau 0000:0b:00.0: DRM: DCB outp 01: 02000f62 00020020 > > [ 6.618474] nouveau 0000:0b:00.0: DRM: DCB outp 03: 02011f52 00020010 > > [ 6.618475] nouveau 0000:0b:00.0: DRM: DCB outp 04: 04822f76 04600010 > > [ 6.618476] nouveau 0000:0b:00.0: DRM: DCB outp 05: 04022f72 00020010 > > [ 6.618477] nouveau 0000:0b:00.0: DRM: DCB outp 08: 01844f36 04600010 > > [ 6.618478] nouveau 0000:0b:00.0: DRM: DCB outp 09: 01044f32 00020010 > > [ 6.618479] nouveau 0000:0b:00.0: DRM: DCB outp 10: 04833f86 04600020 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 00: 00020046 > > [ 6.618481] nouveau 0000:0b:00.0: DRM: DCB conn 01: 00010161 > > [ 6.618482] nouveau 0000:0b:00.0: DRM: DCB conn 02: 01000246 > > [ 6.618483] nouveau 0000:0b:00.0: DRM: DCB conn 03: 02000371 > > [ 6.618484] nouveau 0000:0b:00.0: DRM: DCB conn 04: 00001446 > > [ 6.620448] nouveau 0000:0b:00.0: DRM: MM: using COPY for buffer > copies > > [ 7.062338] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.065331] [drm] Initialized nouveau 1.3.1 20120801 for 0000:0b:00.0 on > minor 1 > > [ 7.254317] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 7.446318] nouveau 0000:0b:00.0: [drm] Cannot find any crtc or sizes > > [ 8.501252] nvidia-gpu 0000:0b:00.3: enabling device (0000 -> 0002) > > [ 8.696138] audit: type=1400 audit(1667665884.700:5): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe" pid=926 comm="apparmor_parser" > > [ 8.696141] audit: type=1400 audit(1667665884.700:6): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="nvidia_modprobe//kmod" pid=926 comm="apparmor_parser" > > [ 8.704333] snd_hda_intel 0000:0b:00.1: bound 0000:0b:00.0 (ops > nv50_audio_component_bind_ops [nouveau]) > > [ 8.708797] input: HDA NVidia HDMI/DP,pcm=3 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input15 > > [ 8.708903] input: HDA NVidia HDMI/DP,pcm=7 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input16 > > [ 8.708936] input: HDA NVidia HDMI/DP,pcm=8 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input17 > > [ 8.708965] input: HDA NVidia HDMI/DP,pcm=9 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input18 > > [ 8.708994] input: HDA NVidia HDMI/DP,pcm=10 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input19 > > [ 8.709032] input: HDA NVidia HDMI/DP,pcm=11 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input20 > > [ 8.709065] input: HDA NVidia HDMI/DP,pcm=12 as > /devices/pci0000:00/0000:00:03.2/0000:0b:00.1/sound/card1/input21 > > [ 10.776280] nouveau 0000:0b:00.0: vgaarb: changed VGA decodes: > olddecodes=io+mem,decodes=none:owns=none > > [ 3275.720190] nouveau 0000:0b:00.0: therm: temperature (90 C) hit the > > 'fanboost' threshold [ 9426.768449] nvidia-gpu 0000:0b:00.3: Unable to > > change power state from D3hot to D0, device inaccessible [ > > 9427.889387] nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff > > timothy@localhost:~> > >