On Friday, June 21, 2013 10:36:05 AM Alexander E. Patrakov wrote: > 2013/6/21 Rafael J. Wysocki <rjw@xxxxxxx>: > > On Wednesday, June 19, 2013 11:18:41 AM Alexander E. Patrakov wrote: > >> 2013/6/19 Rafael J. Wysocki <rjw@xxxxxxx>: > >> > OK, let's try to untangle this a bit. > >> > > >> > If you applyt patches [1/4] and [4/4] from the $subject series only, what > >> > does remain unfixed? > >> > >> [not tested, can do so in 12 hours if needed] > >> > >> I think there will be problems on undocking and/or on the second > >> docking, as described in comments #6 - #8 of > >> https://bugzilla.kernel.org/show_bug.cgi?id=59501 > > > > OK, I think I have something that might work. It may not solve all problems, > > but maybe it helps a bit. Unfortunately, I can't really test it, so please do > > if you can. > > > > Please apply [1/4] and [4/4] and the one below and see what happens. > > Tested on top of 3.10-rc6. > > Attached dmesg output for the following testcase: boot undocked, dock, > undock, dock. > > The initial dock went OK. The subsequent undock resulted in the blue > led on the dock cable turning off quickly, but in PCI devices slowly, > one-by-one, disappearing from the bus. Also, there were "acpi_handle > corrupt" messages in dmesg. The subsequent dock resulted in no devices > added to the bus. So - your patch is not a good replacement for > patches 2 and 3 in the original series. Well, this particular patch maybe not, but I like the direction better. Here's the relevant piece of your dmesg: [ 43.635516] ACPI: \_SB_.DOCK: undocking [ 44.108267] acpiphp_glue: _handle_hotplug_event_func: Device eject notify on \_SB_.PCI0.RP07.LPMB [ 44.110349] xhci_hcd 0000:1b:00.0: remove, state 4 [ 44.110497] usb usb6: USB disconnect, device number 1 [ 44.112203] port1: Oops, 'acpi_handle' corrupt [ 44.112242] port2: Oops, 'acpi_handle' corrupt What happens here is that USB ports are in the list of dock dependent devices, but they don't have the ops pointer set, so dock_remove_acpi_device() will be called for them *before* the usual USB teardown is triggered from the acpiphp_glue code for their parent. I *think* the solution may be to simply skip those things from ds->dependent_list (as they depend on something already in that list anyway). I'll attach an updated patch with that change to the bug entry. [ 44.112332] xHCI xhci_drop_endpoint called for root hub [ 44.112334] xHCI xhci_check_bandwidth called for root hub [ 44.114313] usb usb6: Oops, 'acpi_handle' corrupt Same thing again. [ 44.114349] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds. [ 44.114351] xhci_hcd 0000:1b:00.0: USB bus 6 deregistered [ 44.114360] xhci_hcd 0000:1b:00.0: remove, state 4 [ 44.114392] usb usb5: USB disconnect, device number 1 [ 44.114394] usb 5-1: USB disconnect, device number 2 [ 49.120740] xhci_hcd 0000:1b:00.0: Timeout while waiting for configure endpoint command [ 49.120750] xhci_hcd 0000:1b:00.0: Stopped the command ring failed, maybe the host is dead [ 49.120757] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds. [ 49.120759] xhci_hcd 0000:1b:00.0: Abort command ring failed [ 49.120760] xhci_hcd 0000:1b:00.0: HC died; cleaning up [ 49.122075] xHCI xhci_drop_endpoint called for root hub [ 49.122076] xHCI xhci_check_bandwidth called for root hub [ 49.123825] usb usb5: Oops, 'acpi_handle' corrupt Same thing again. [ 49.123906] xhci_hcd 0000:1b:00.0: Host not halted after 16000 microseconds. [ 49.124208] xhci_hcd 0000:1b:00.0: USB bus 5 deregistered [ 52.150756] ata8.00: disabled [ 52.172219] r8169 0000:19:00.0 enp25s0: rtl_counters_cond == 1 (loop: 1000, delay: 10). [ 52.182516] r8169 0000:19:00.0 enp25s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100). [ 52.188939] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25). [ 52.189542] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25). [ 52.190191] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25). [ 52.190746] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25). [ 52.191299] r8169 0000:19:00.0 enp25s0: rtl_phyar_cond == 1 (loop: 20, delay: 25). [ 52.212009] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100). [ 52.222185] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100). [ 52.232357] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100). [ 52.242529] r8169 0000:19:00.0 (unregistered net_device): rtl_eriar_cond == 1 (loop: 100, delay: 100). [ 52.260158] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=none,decodes=io+mem:owns=none [ 52.265799] [drm] radeon: finishing device. [ 52.265804] [drm] Disabling audio support [ 52.267668] radeon 0000:16:00.0: ffff880251e28400 unpin not necessary [ 52.268033] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown [ 52.268164] [TTM] Finalizing pool allocator [ 52.268340] [TTM] Finalizing DMA pool allocator The thing below looks like a radeon problem? [ 52.268389] ------------[ cut here ]------------ [ 52.268395] WARNING: at drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:533 ttm_dma_free_pool+0x101/0x110 [ttm]() [ 52.268396] Modules linked in: ata_generic pata_acpi pata_marvell radeon ttm zram(C) rfcomm bnep rtsx_pci_ms rtsx_pci_sdmmc mmc_core memstick iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm joydev pcspkr i2c_i801 uvcvideo qcserial usb_wwan videobuf2_vmalloc videobuf2_memops videobuf2_core usbserial btusb videodev bluetooth arc4 media iwldvm mac80211 snd_hda_codec_hdmi iwlwifi r8169 rtsx_pci mii cfg80211 snd_hda_codec_realtek i915 sony_laptop rfkill intel_agp snd_hda_intel intel_gtt snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm snd_page_alloc drm snd_timer agpgart lpc_ich snd mfd_core sha256_ssse3 sha256_generic dm_crypt raid0 md_mod crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd xhci_hcd ehci_pci ehci_hcd dm_mirror [ 52.268444] dm_region_hash dm_log dm_mod [last unloaded: microcode] [ 52.268449] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G C 3.10.0-rc6-rafael #1 [ 52.268450] Hardware name: Sony Corporation VPCZ23A4R/VAIO, BIOS R1013H5 05/21/2012 [ 52.268455] Workqueue: kacpi_hotplug _handle_hotplug_event_func [ 52.268456] ffffffffa081f268 ffff8802540d38b8 ffffffff8165af98 ffff8802540d38f8 [ 52.268460] ffffffff8103c8cb ffff8802540d38c8 ffff880251f53700 ffff8802513e23c0 [ 52.268463] 0000000000000008 ffff880253d02530 ffff880254079730 ffff8802540d3908 [ 52.268466] Call Trace: [ 52.268470] [<ffffffff8165af98>] dump_stack+0x19/0x1b [ 52.268474] [<ffffffff8103c8cb>] warn_slowpath_common+0x6b/0xa0 [ 52.268476] [<ffffffff8103c915>] warn_slowpath_null+0x15/0x20 [ 52.268480] [<ffffffffa081cd01>] ttm_dma_free_pool+0x101/0x110 [ttm] [ 52.268484] [<ffffffffa081dee5>] ttm_dma_page_alloc_fini+0x85/0xcc [ttm] [ 52.268487] [<ffffffffa0813455>] ttm_mem_global_release+0x25/0xa0 [ttm] [ 52.268497] [<ffffffffa08494ed>] radeon_ttm_mem_global_release+0xd/0x10 [radeon] [ 52.268507] [<ffffffffa0176213>] drm_global_item_unref+0x63/0x90 [drm] [ 52.268514] [<ffffffffa084a721>] radeon_ttm_fini+0xd1/0xe0 [radeon] [ 52.268522] [<ffffffffa084b129>] radeon_bo_fini+0x9/0x10 [radeon] [ 52.268531] [<ffffffffa0893bf1>] evergreen_fini+0x91/0xc0 [radeon] [ 52.268537] [<ffffffffa08336ea>] radeon_device_fini+0x3a/0xf0 [radeon] [ 52.268543] [<ffffffffa08353d1>] radeon_driver_unload_kms+0x41/0x70 [radeon] [ 52.268550] [<ffffffffa016524e>] drm_put_dev+0x6e/0x210 [drm] [ 52.268555] [<ffffffffa0832068>] radeon_pci_remove+0x18/0x20 [radeon] [ 52.268557] [<ffffffff81385831>] pci_device_remove+0x41/0xc0 [ 52.268561] [<ffffffff8143bfd7>] __device_release_driver+0x77/0xe0 [ 52.268563] [<ffffffff8143c069>] device_release_driver+0x29/0x40 [ 52.268566] [<ffffffff8143ba71>] bus_remove_device+0xf1/0x140 [ 52.268568] [<ffffffff8143915d>] device_del+0x11d/0x1b0 [ 52.268572] [<ffffffff8138082c>] pci_stop_bus_device+0x9c/0xb0 [ 52.268574] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0 [ 52.268576] [<ffffffff813a11e5>] ? acpiphp_disable_slot+0x35/0x140 [ 52.268579] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0 [ 52.268581] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0 [ 52.268584] [<ffffffff813807cb>] pci_stop_bus_device+0x3b/0xb0 [ 52.268586] [<ffffffff81380991>] pci_stop_and_remove_bus_device+0x11/0x20 [ 52.268588] [<ffffffff813a1236>] acpiphp_disable_slot+0x86/0x140 [ 52.268591] [<ffffffff813a1622>] _handle_hotplug_event_func+0x102/0x1e0 [ 52.268594] [<ffffffff8105c2c2>] process_one_work+0x1c2/0x560 [ 52.268597] [<ffffffff8105c257>] ? process_one_work+0x157/0x560 [ 52.268599] [<ffffffff8105d1d6>] worker_thread+0x116/0x370 [ 52.268601] [<ffffffff8105d0c0>] ? manage_workers.isra.20+0x2d0/0x2d0 [ 52.268604] [<ffffffff81063a36>] kthread+0xd6/0xe0 [ 52.268606] [<ffffffff816611ab>] ? _raw_spin_unlock_irq+0x2b/0x60 [ 52.268609] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70 [ 52.268612] [<ffffffff8166856c>] ret_from_fork+0x7c/0xb0 [ 52.268615] [<ffffffff81063960>] ? __init_kthread_worker+0x70/0x70 [ 52.268669] ---[ end trace 46b4e977738a3df6 ]--- [ 52.269258] [TTM] Zone kernel: Used memory at exit: 13 kiB [ 52.269266] [TTM] Zone dma32: Used memory at exit: 9 kiB [ 52.269295] [drm] radeon: ttm finalized But it looks like the removal actually succeeded. [ 52.273103] pci_bus 0000:0b: busn_res: [bus 0b] is released [ 52.273732] pci_bus 0000:0c: busn_res: [bus 0c] is released [ 52.273816] pci_bus 0000:16: busn_res: [bus 16] is released Is the re-dock attempt included? It doesn't seem to leave any trace ... Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html