On Tue, Aug 30, 2016 at 12:08:57PM +0200, Roland Singer wrote: > Thanks for pointing it out. > > Yeah that's right. The system will hang randomly a few minutes later, > because some certain actions in the graphical user session will trigger > the freeze. > > I had a look at the function body of pci_read_config_dword: > > #define PCI_OP_READ(size, type, len) \ > int pci_bus_read_config_##size \ > (struct pci_bus *bus, unsigned int devfn, int pos, type *value) \ > { \ > int res; \ > unsigned long flags; \ > u32 data = 0; \ > if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ > raw_spin_lock_irqsave(&pci_lock, flags); \ > res = bus->ops->read(bus, devfn, pos, len, &data); \ > *value = (type)data; \ > raw_spin_unlock_irqrestore(&pci_lock, flags); \ > return res; \ > } > > I guess, that bus->ops->read(...) might be the trigger. > Any hints how to continue debugging? It's not likely that the problem is in the bus->ops->read() path. That is used by every device driver, so a problem there would cause more serious problems than what you're seeing. My guess would be some problem in the video driver or the bbswitch thing. > Am 30.08.2016 um 01:54 schrieb Bjorn Helgaas: > > On Mon, Aug 29, 2016 at 09:55:56PM +0200, Roland Singer wrote: > >> Just tried it and the system didn't freeze. However it will freeze > >> after some time (few minutes while working). > >> > >> Seams to be pci_read_config_dword. Where is this exactly defined? > > > > pci_read_config_dword() is defined in include/linux/pci.h. It calls > > pci_bus_read_config_dword() which is defined by the PCI_OP_READ() macro > > in drivers/pci/access.c. > > > > If I understand correctly, this: > > > > dis_dev_get(); > > pci_read_config_dword(dis_dev, 0, &cfg_word); > > dis_dev_put(); > > > > causes an immediate system hang, but if you only do this: > > > > dis_dev_get(); > > dis_dev_put(); > > > > the system hangs a few minutes later. Right? > > > >> Am 29.08.2016 um 21:07 schrieb Bjorn Helgaas: > >>> On Mon, Aug 29, 2016 at 08:46:17PM +0200, Roland Singer wrote: > >>>> Hi Bjorn, > >>>> > >>>> I am using the bbswitch kernel module to switch off/on the GPU and > >>>> to obtain the GPU power state. > >>>> Obtaining the GPU state immediately after starting the graphical user > >>>> session freezes the system. > >>>> > >>>> This code triggers something, which is responsible for the freeze. > >>>> > >>>> --- > >>>> // Returns 1 if the card is disabled, 0 if enabled > >>>> static int is_card_disabled(void) { > >>>> u32 cfg_word; > >>>> // read first config word which contains Vendor and Device ID. If all bits > >>>> // are enabled, the device is assumed to be off > >>>> pci_read_config_dword(dis_dev, 0, &cfg_word); > >>>> // if one of the bits is not enabled (the card is enabled), the inverted > >>>> // result will be non-zero and hence logical not will make it 0 ("false") > >>>> return !~cfg_word; > >>>> } > >>>> > >>>> static int bbswitch_proc_show(struct seq_file *seqfp, void *p) { > >>>> // show the card state. Example output: 0000:01:00:00 ON > >>>> dis_dev_get(); > >>>> seq_printf(seqfp, "%s %s\n", dev_name(&dis_dev->dev), > >>>> is_card_disabled() ? "OFF" : "ON"); > >>>> dis_dev_put(); > >>>> return 0; > >>>> } > >>>> --- > >>>> > >>>> Either dis_dev_get or pci_read_config_dword is the trigger. > >>> > >>> What happens if you remove the call to is_card_disabled()? Does the > >>> system still freeze if you only do the dis_dev_get()/dis_dev_put()? > >>> > >>>> Link to the bbswitch module source code: > >>>> https://github.com/Bumblebee-Project/bbswitch/blob/master/bbswitch.c#L333 > >>>> > >>>> > >>>> Am 29.08.2016 um 18:02 schrieb Bjorn Helgaas: > >>>>> [+cc linux-acpi, linux-kernel, dri-devel] > >>>>> > >>>>> Hi Roland, > >>>>> > >>>>> I have no idea how to debug this problem. Are you seeing something > >>>>> that suggests it may be a PCI problem? > >>>>> > >>>>> On Tue, Aug 23, 2016 at 11:23:45AM +0200, Roland Singer wrote: > >>>>>> Hi, > >>>>>> > >>>>>> hope somebody can help me fix this kernel problem which affects the following machines: > >>>>>> > >>>>>> - Clevo P651RA (i7-6700HQ/GTX 965M, part of the P6xxRx family which are also affected) > >>>>>> - MSI GE62 Apache Pro (i7-6700HQ/GTX 960M) > >>>>>> - Gigabyte P35V5 (i7-6700HQ/GTX 970M) > >>>>>> - Razer Blade 14" (2016) (i7-6700HQ/GTX 970M) (BIOS 5.11, 04/07/2016) > >>>>>> > >>>>>> > >>>>>> The kernel freezes if the graphical user session (Xorg & Wayland) is > >>>>>> started with a switched off discrete GPU card (NVIDIA). > >>>>>> If the discrete GPU is switched off after the graphical session start, > >>>>>> then everything works as expected, until the graphical session is restarted. > >>>>>> > >>>>>> This problem seams to be linked to specific BIOS settings. If the computer > >>>>>> is started with the following command line: > >>>>>> > >>>>>> acpi_osi=! acpi_osi="Windows 2009" > >>>>>> > >>>>>> then the kernel freeze does not occur anymore. However this required a special > >>>>>> ACPI DSDT firmware patch for the Razer Blade 2016 laptop: > >>>>>> > >>>>>> https://github.com/m4ng0squ4sh/razer_blade_14_2016_acpi_dsdt > >>>>>> > >>>>>> I strongly recommend to fix this in the kernel and I am ready to help and solve > >>>>>> this problem with some help. > >>>>>> > >>>>>> Here is a link to the GitHub issue with further information: > >>>>>> > >>>>>> https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-241212595 > >>>>>> > >>>>>> Here are some more detailed information: > >>>>>> > >>>>>> https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-P651RA/notes.txt > >>>>>> > >>>>>> Hope somebody can help. > >>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in > >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in > >>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html