Hi all, śr., 10 lis 2021 o 15:14 Bjorn Helgaas <helgaas@xxxxxxxxxx> napisał(a): > On Tue, Nov 09, 2021 at 02:05:18PM -0600, Bjorn Helgaas wrote: > > On Tue, Nov 09, 2021 at 07:58:47PM +0100, Rafael J. Wysocki wrote: > > > On Tue, Nov 9, 2021 at 7:52 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > ... > > > > > So instead, we can drop the pm_runtime_get_sync() and > > > > pm_runtime_put_sync() from local_pci_probe() and pci_device_remove(), > > > > respectively, and add pm_runtine_get_noresume() to pci_pm_init(), > > > > which will prevent PM-runtime from touching the device until it has a > > > > driver that supports PM-runtime. > > > > > > > > We'll lose the theoretical ability to put unbound devices into D3 this > > > > way, but we learned some time ago that this isn't safe in all cases > > > > anyway. > > > > > > IOW, something like this (untested and most likely white-space-damaged). > > > > Thanks! I applied this manually to for-linus in hopes of making the > > the next linux-next build. > > > > Please send any testing reports and corrections to the patch and > > commit log! > > Robert, I hate to ask even more of you, but if you have a chance, it > would be very helpful if you could test the patch below. I'm pretty > sure it should fix the problem you saw, and I hope to ask Linus to > merge it today. I think the most recent patch creates some timeouts and other problems in pci-related code? Things I haven't seen before. But, granted, my kernel testing approach is not with focus on details, so maybe I did sth wrong. code is now at: cb690f5238d71f543f4ce874aa59237cf53a877c https://github.com/torvalds/linux/commit/cb690f5238d71f543f4ce874aa59237cf53a877c lis 10 17:26:41 jd kernel: Linux version 5.15.0+ (jagger@jd) (gcc (Debian 11.2.0-10) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37.50.20211102) #105 SMP PREEMPT Wed Nov 10 17:23:49 CET 2021 lis 10 17:26:41 jd kernel: Command line: BOOT_IMAGE=/vmlinuz-5.15.0+ root=UUID=8759fa14-93a4-4dc1-87e6-aa6f5cdbb2ff ro nosplash mitigations=off no_file_caps apparmor=0 selinux=0 audit=0 amdgpu.ppfeaturemask=0xffffffff hugepagesz=1G default_hugepagesz=1G hugepages=8 amd_iommu=on iommu=pt vfio-pci.ids=10de:1f02,10de:10f9,10de:1ada,10de:1adb drm.edid_firmware=DP-1:edid/sam-g9.edid isolcpus=1,2,5,6,17,18,21,22 .. lis 10 17:26:41 jd kernel: usb usb5: runtime PM trying to activate child device usb5 but parent (0000:0c:00.2) is not active .. lis 10 17:26:41 jd kernel: i2c-designware-pci 0000:0c:00.3: timeout in disabling adapter lis 10 17:26:41 jd kernel: i2c-designware-pci 0000:0c:00.3: timeout in disabling adapter .. lis 10 17:26:41 jd kernel: usb 1-5.3: config 1 has an invalid interface number: 2 but max is 1 lis 10 17:26:41 jd kernel: usb 1-5.3: config 1 has no interface number 1 .. lis 10 17:26:41 jd kernel: i2c-designware-pci 0000:0c:00.3: timeout in disabling adapter .. lis 10 17:26:41 jd kernel: i2c-designware-pci 0000:0c:00.3: timeout in disabling adapter lis 10 17:26:41 jd kernel: i2c-designware-pci 0000:0c:00.3: i2c timeout error -110 lis 10 17:26:41 jd kernel: ucsi_ccg 3-0008: i2c_transfer failed -110 lis 10 17:26:41 jd kernel: ucsi_ccg 3-0008: ucsi_ccg_init failed - -110 lis 10 17:26:41 jd kernel: ucsi_ccg: probe of 3-0008 failed with error -110 lis 10 17:26:42 jd kernel: snd_hda_intel 0000:0c:00.1: azx_get_response timeout, switching to polling mode: last cmd=0x000f0000 lis 10 17:26:42 jd kernel: hdaudio hdaudioC0D0: runtime PM trying to activate child device hdaudioC0D0 but parent (0000:0c:00.1) is not a> lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi hdaudioC0D0: HDMI: failed to get afg sub nodes lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi: probe of hdaudioC0D0 failed with error -22 lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi hdaudioC0D0: HDMI: failed to get afg sub nodes lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi: probe of hdaudioC0D0 failed with error -22 lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi hdaudioC0D0: HDMI: failed to get afg sub nodes lis 10 17:26:42 jd kernel: snd_hda_codec_hdmi: probe of hdaudioC0D0 failed with error -22 ... lis 10 17:26:42 jd kernel: snd_hda_codec_generic: probe of hdaudioC0D0 failed with error -5 lis 10 17:26:42 jd kernel: snd_hda_intel 0000:0c:00.1: Cannot probe codecs, giving up lis 10 17:26:42 jd kernel: ------------[ cut here ]------------ lis 10 17:26:42 jd kernel: WARNING: CPU: 0 PID: 206 at sound/hda/hdac_bus.c:73 snd_hdac_bus_exit+0x3b/0x80 [snd_hda_core] lis 10 17:26:42 jd kernel: Modules linked in: snd_hda_codec_hdmi(E) nls_iso8859_2(E) snd_hda_codec_realtek(E) nls_cp852(E) snd_hda_codec_> lis 10 17:26:42 jd kernel: i2c_designware_pci(E) drm(E) i2c_piix4(E) i2c_designware_core(E) xhci_pci(E) backlight(E) ice(E) xhci_pci_ren> lis 10 17:26:42 jd kernel: CPU: 0 PID: 206 Comm: kworker/0:2 Tainted: G E 5.15.0+ #105 lis 10 17:26:42 jd kernel: Hardware name: ASUS System Product Name/ROG CROSSHAIR VIII FORMULA, BIOS 3901 09/07/2021 lis 10 17:26:42 jd kernel: Workqueue: events azx_probe_work [snd_hda_intel] lis 10 17:26:42 jd kernel: RIP: 0010:snd_hdac_bus_exit+0x3b/0x80 [snd_hda_core] lis 10 17:26:42 jd kernel: Code: 75 19 48 8b 57 58 48 8d 47 58 48 39 c2 75 1b 48 81 c7 f8 02 00 00 e9 94 61 10 d8 0f 0b 48 8b 57 58 48 8d> lis 10 17:26:42 jd kernel: RSP: 0018:ffffa3e1008d3e40 EFLAGS: 00010283 lis 10 17:26:42 jd kernel: RAX: ffff9728c5f38080 RBX: ffff9728c5f38028 RCX: 000000008020001f lis 10 17:26:42 jd kernel: RDX: ffff9728c1f40b00 RSI: ffffffff9a30ddc9 RDI: ffff9728c5f38028 lis 10 17:26:42 jd kernel: RBP: ffff9728c5f384d0 R08: 0000000000000000 R09: ffff9728c14a4e00 lis 10 17:26:42 jd kernel: R10: 0000000000000000 R11: 0000000000000005 R12: 00000000ffffffed lis 10 17:26:42 jd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff972fcea26405 lis 10 17:26:42 jd kernel: FS: 0000000000000000(0000) GS:ffff972fcea00000(0000) knlGS:0000000000000000 lis 10 17:26:42 jd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 lis 10 17:26:42 jd kernel: CR2: 00007fa43a298cb0 CR3: 000000010e58e000 CR4: 0000000000750ef0 lis 10 17:26:42 jd kernel: PKRU: 55555554 lis 10 17:26:42 jd kernel: Call Trace: lis 10 17:26:42 jd kernel: <TASK> lis 10 17:26:42 jd kernel: azx_free+0xe5/0x1c0 [snd_hda_intel] lis 10 17:26:42 jd kernel: azx_probe_continue+0x1a8/0x300 [snd_hda_intel] lis 10 17:26:42 jd kernel: process_one_work+0x1eb/0x380 lis 10 17:26:42 jd kernel: worker_thread+0x48/0x400 lis 10 17:26:42 jd kernel: ? rescuer_thread+0x3c0/0x3c0 lis 10 17:26:42 jd kernel: kthread+0x151/0x180 lis 10 17:26:42 jd kernel: ? set_kthread_struct+0x40/0x40 lis 10 17:26:42 jd kernel: ret_from_fork+0x1f/0x30 lis 10 17:26:42 jd kernel: </TASK> lis 10 17:26:42 jd kernel: ---[ end trace f522829b4b020462 ]--- lis 10 17:26:42 jd kernel: i2c-designware-pci 0000:0c:00.3: timeout in disabling adapter > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index 1d98c974381c..41cdf510214f 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -309,16 +309,6 @@ static long local_pci_probe(void *_ddi) > > struct device *dev = &pci_dev->dev; > > int rc; > > > > - /* > > - * Unbound PCI devices are always put in D0, regardless of > > - * runtime PM status. During probe, the device is set to > > - * active and the usage count is incremented. If the driver > > - * supports runtime PM, it should call pm_runtime_put_noidle(), > > - * or any other runtime PM helper function decrementing the usage > > - * count, in its probe routine and pm_runtime_get_noresume() in > > - * its remove routine. > > - */ > > - pm_runtime_get_sync(dev); > > rc = pci_drv->probe(pci_dev, ddi->id); > > if (!rc) > > return rc; > > @@ -464,9 +454,6 @@ static void pci_device_remove(struct device *dev) > > pcibios_free_irq(pci_dev); > > pci_iov_remove(pci_dev); > > > > - /* Undo the runtime PM settings in local_pci_probe() */ > > - pm_runtime_put_sync(dev); > > - > > /* > > * If the device is still on, set the power state as "unknown", > > * since it might change by the next time we load the driver. > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index b88db815ee01..e9c38b994c73 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -3097,7 +3097,15 @@ void pci_pm_init(struct pci_dev *dev) > > u16 pmc; > > > > pm_runtime_forbid(&dev->dev); > > + > > + /* > > + * Unbound PCI devices are always put in D0. If the driver supports > > + * runtime PM, it should call pm_runtime_put_noidle(), or any other > > + * runtime PM helper function decrementing the usage count, in its > > + * probe routine and pm_runtime_get_noresume() in its remove routine. > > + */ > > pm_runtime_set_active(&dev->dev); > > + pm_runtime_get_noresume(&dev->dev); > > pm_runtime_enable(&dev->dev); > > device_enable_async_suspend(&dev->dev); > > dev->wakeup_prepared = false; -- Robert Święcki