Re: [Nouveau] [PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30 May 2016 at 12:23, Peter Wu <peter@xxxxxxxxxxxxx> wrote:
> On Mon, May 30, 2016 at 11:48:34AM +0100, Emil Velikov wrote:
>> On 27 May 2016 at 22:31, Peter Wu <peter@xxxxxxxxxxxxx> wrote:
>> > On Fri, May 27, 2016 at 02:01:39PM +0100, Emil Velikov wrote:
>> >> Hi Peter,
>> >>
>> >> On 24 May 2016 at 23:53, Peter Wu <peter@xxxxxxxxxxxxx> wrote:
>> >> > Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe port
>> >> > can be runtime-suspended which disables power resources via ACPI. This
>> >> > is incompatible with DSM, resulting in a GPU device which is still in D3
>> >> > and locks up the kernel on resume.
>> >> >
>> >> > Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi
>> >> > debugger trace) and stop using the DSM functions for D3cold when power
>> >> > resources are available on the parent PCIe port.
>> >> >
>> >> >  [1]: https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold
>> >> >
>> >> > Signed-off-by: Peter Wu <peter@xxxxxxxxxxxxx>
>> >> > ---
>> >> >  drivers/gpu/drm/nouveau/nouveau_acpi.c | 34 ++++++++++++++++++++++++++++++----
>> >> >  1 file changed, 30 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
>> >> > index df9f73e..e469df7 100644
>> >> > --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
>> >> > +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
>> >> > @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv {
>> >> >         bool dsm_detected;
>> >> >         bool optimus_detected;
>> >> >         bool optimus_flags_detected;
>> >> > +       bool optimus_skip_dsm;
>> >> >         acpi_handle dhandle;
>> >> >         acpi_handle rom_handle;
>> >> >  } nouveau_dsm_priv;
>> >> > @@ -212,8 +213,26 @@ static const struct vga_switcheroo_handler nouveau_dsm_handler = {
>> >> >         .get_client_id = nouveau_dsm_get_client_id,
>> >> >  };
>> >> >
>> >> > +/* Firmware supporting Windows 8 or later do not use _DSM to put the device into
>> >> > + * D3cold, they instead rely on disabling power resources on the parent. */
>> >> > +static bool nouveau_pr3_present(struct pci_dev *pdev)
>> >> > +{
>> >> > +       struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
>> >> > +       struct acpi_device *ad;
>> >> > +
>> >> > +       if (!parent_pdev)
>> >> > +               return false;
>> >> > +
>> >> > +       ad = ACPI_COMPANION(&parent_pdev->dev);
>> >> > +       if (!ad)
>> >> > +               return false;
>> >> > +
>> >> > +       return ad->power.flags.power_resources;
>> >> > +}
>> >> > +
>> >> >  static void nouveau_dsm_pci_probe(struct pci_dev *pdev, bool *has_mux,
>> >> > -                                 bool *has_opt, bool *has_opt_flags)
>> >> > +                                 bool *has_opt, bool *has_opt_flags,
>> >> > +                                 bool *has_power_resources)
>> >> >  {
>> >> >         acpi_handle dhandle;
>> >> >         bool supports_mux;
>> >> > @@ -238,6 +257,7 @@ static void nouveau_dsm_pci_probe(struct pci_dev *pdev, bool *has_mux,
>> >> >         *has_mux = supports_mux;
>> >> >         *has_opt = !!optimus_funcs;
>> >> >         *has_opt_flags = optimus_funcs & (1 << NOUVEAU_DSM_OPTIMUS_FLAGS);
>> >> > +       *has_power_resources = false;
>> >> >
>> >> >         if (optimus_funcs) {
>> >> >                 uint32_t result;
>> >> > @@ -247,6 +267,8 @@ static void nouveau_dsm_pci_probe(struct pci_dev *pdev, bool *has_mux,
>> >> >                          (result & OPTIMUS_ENABLED) ? "enabled" : "disabled",
>> >> >                          (result & OPTIMUS_DYNAMIC_PWR_CAP) ? "dynamic power, " : "",
>> >> >                          (result & OPTIMUS_HDA_CODEC_MASK) ? "hda bios codec supported" : "");
>> >> > +
>> >> > +               *has_power_resources = nouveau_pr3_present(pdev);
>> >> >         }
>> >> >  }
>> >> >
>> >> > @@ -258,6 +280,7 @@ static bool nouveau_dsm_detect(void)
>> >> >         bool has_mux = false;
>> >> >         bool has_optimus = false;
>> >> >         bool has_optimus_flags = false;
>> >> > +       bool has_power_resources = false;
>> >> >         int vga_count = 0;
>> >> >         bool guid_valid;
>> >> >         bool ret = false;
>> >> > @@ -273,14 +296,14 @@ static bool nouveau_dsm_detect(void)
>> >> >                 vga_count++;
>> >> >
>> >> >                 nouveau_dsm_pci_probe(pdev, &has_mux, &has_optimus,
>> >> > -                                     &has_optimus_flags);
>> >> > +                                     &has_optimus_flags, &has_power_resources);
>> >> >         }
>> >> >
>> >> >         while ((pdev = pci_get_class(PCI_CLASS_DISPLAY_3D << 8, pdev)) != NULL) {
>> >> >                 vga_count++;
>> >> >
>> >> >                 nouveau_dsm_pci_probe(pdev, &has_mux, &has_optimus,
>> >> > -                                     &has_optimus_flags);
>> >> > +                                     &has_optimus_flags, &has_power_resources);
>> >> >         }
>> >> >
>> >> This and earlier patch break things in a subtle way.
>> >>
>> >> Namely: upon the second (and any later) call into the
>> >> nouveau_dsm_pci_probe() function, the had_foo flags are reset. Thus
>> >> only the specifics of the _final_ device are being used (at a later
>> >> stage). IMHO one should change that to "_any_ device", which will
>> >> match the original code and the actual intent further down in the
>> >> file.
>> >
>> > The flags are only reset if any of the MUX or Optimus handles are found.
>> > If both are missing, the flags are not overridden. This is from patch 1:
>> >
>> > +       /* Does not look like a Nvidia device. */
>> > +       if (!supports_mux && !supports_opt)
>> > +               return;
>> >
>> This is precisely what I'm saying, and I think it's wrong/strange. If
>> you've detected that device A support_{X,Y}, you'll reset the
>> support_{X,Y} flag anyway if device B is present... (continues further
>> down)
>
> The flags will only be reset when device B supports at least one
> function.
>
Indeed. Seems like I completely misread the code on multiple
occasions. Sorry about the noise.

>> > The reason why later calls override early ones is because some Optimus
>> > laptops have the _DSM method on both the Intel GPU (00:02.0) and the
>> > Nvidia one (01:00.0).
>> >
>> I agree with Lukas idea that one could/should be checking for nvidia
>> devices (perhaps in nouveau_dsm_pci_probe() or just before calling it
>> ?).
>
> That could break PM on at least two Acer laptops. The Acer Travelmate
> 8472TG from 2011 (acpidump[1]) has two DSM on the Nvidia and Intel ACPI
> handles:
>
>  - Nvidia: supports MXM methods only.
>  - Intel: supports the older Nvidia UUID (for toggling power and
>    possibly other things).
>
>  [1]: https://github.com/Bumblebee-Project/bbswitch/issues/4#issuecomment-219988501
>
> There is also an Acer Aspire 5742G which possibly breaks (linked in the
> above issue), but that could be a configuration issue that disabled
> Optimus in BIOS (unconfirmed).
>
> If it matters, both of these laptops have a MXMX method (Select Display
> Data Channel), but their MXMI (Return Specification Support Level) and
> MXMS (Return MXM Structure) functions are disfunctional. There is also a
> MXDS function on both ACPI handles, but these are not hooked to the WMI
> interface for some reason. No idea of Acer has hacked up some drivers to
> work with this, outside these models I do not know others that are also
> affected by this issue.
>
/me takes a sigh "Why Acer why ..." :-)

>> > The previous detection method would fail in this scenario:
>> >  1. One device reports support for X and Y (has_x = 1, has_y = 1). Write
>> >     ACPI handle A to nouveau_dsm_priv.dhandle.
>> >  2. Another device reports support for X only (has_x = 1). Write
>> >     ACPI handle B to nouveau_dsm_priv.dhandle.
>> >  3. End result: has_x = 1, has_y = 1, dhandle = B. But ACPI handle B
>> >     does not really support Y!
>>
>> ...  so to avoid the above case and preserve the original ideas ('do
>> not discard earlier device caps' and 'Optimus takes precedence over
>> DSM v1') one could do the following:
>>
>>  - decouple the "feature check" and "set the dhandle"
>>
>>  - pick the 'ideal' one based on the feature set provided. if multiple
>> pick one based on $insert_heuristics
>>  - set the dhandle
>>
>> What do you think ?
>
> The dhandle is only set when at least one valid DSM was found on the
> device. The dhandle assignment could indeed be moved to the caller,
> making it more obvious that the dhandle is only valid when the
> capabilities are detected (this does not have a functional change
> though). I'll do it in the next version.

That will be amazing, thanks.
Emil
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux