Re: FAILED: patch "[PATCH] drm/nouveau: Move irq setup/teardown to pci ctor/dtor" failed to apply to 4.14-stable tree

Lyude Paul <lyude@xxxxxxxxxx> · Mon, 29 Jan 2018 12:47:05 -0500

Oh sweet, looks like this managed to make it in on the final pull! I had cc'd
stable in case it didn't manage to. You can ignore backporting this patch; the
problem shouldn't be present on 4.14 and earlier

On Sat, 2018-01-27 at 11:02 +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> The patch below does not apply to the 4.14-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@xxxxxxxxxxxxxxx>.
> 
> thanks,
> 
> greg k-h
> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 0fd189a95fdbc631737df5f27a0fc0a3dd31b75e Mon Sep 17 00:00:00 2001
> From: Lyude Paul <lyude@xxxxxxxxxx>
> Date: Thu, 25 Jan 2018 18:29:53 -0500
> Subject: [PATCH] drm/nouveau: Move irq setup/teardown to pci ctor/dtor
> 
> For a while we've been having issues with seemingly random interrupts
> coming from nvidia cards when resuming them. Originally the fix for this
> was thought to be just re-arming the MSI interrupt registers right after
> re-allocating our IRQs, however it seems a lot of what we do is both
> wrong and not even nessecary.
> 
> This was made apparent by what appeared to be a regression in the
> mainline kernel that started introducing suspend/resume issues for
> nouveau:
> 
>         a0c9259dc4e1 (irq/matrix: Spread interrupts on allocation)
> 
> After this commit was introduced, we started getting interrupts from the
> GPU before we actually re-allocated our own IRQ (see references below)
> and assigned the IRQ handler. Investigating this turned out that the
> problem was not with the commit, but the fact that nouveau even
> free/allocates it's irqs before and after suspend/resume.
> 
> For starters: drivers in the linux kernel haven't had to handle
> freeing/re-allocating their IRQs during suspend/resume cycles for quite
> a while now. Nouveau seems to be one of the few drivers left that still
> does this, despite the fact there's no reason we actually need to since
> disabling interrupts from the device side should be enough, as the
> kernel is already smart enough to know to disable host-side interrupts
> for us before going into suspend. Since we were tearing down our IRQs by
> hand however, that means there was a short period during resume where
> interrupts could be received before we re-allocated our IRQ which would
> lead to us getting an unhandled IRQ. Since we never handle said IRQ and
> re-arm the interrupt registers, this would cause us to miss all of the
> interrupts from the GPU and cause our init process to start timing out
> on anything requiring interrupts.
> 
> So, since this whole setup/teardown every suspend/resume cycle is
> useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
> functions instead so they're only called at driver load and driver
> unload. This should fix most of the issues with pending interrupts on
> resume, along with getting suspend/resume for nouveau to work again.
> 
> As well, this probably means we can also just remove the msi rearm call
> inside nvkm_pci_init(). But since our main focus here is to fix
> suspend/resume before 4.15, we'll save that for a later patch.
> 
> Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx>
> Cc: Karol Herbst <kherbst@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Mike Galbraith <efault@xxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Ben Skeggs <bskeggs@xxxxxxxxxx>
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
> b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
> index deb96de54b00..ee2431a7804e 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
> @@ -71,6 +71,10 @@ nvkm_pci_intr(int irq, void *arg)
>  	struct nvkm_pci *pci = arg;
>  	struct nvkm_device *device = pci->subdev.device;
>  	bool handled = false;
> +
> +	if (pci->irq < 0)
> +		return IRQ_HANDLED;
> +
>  	nvkm_mc_intr_unarm(device);
>  	if (pci->msi)
>  		pci->func->msi_rearm(pci);
> @@ -84,11 +88,6 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
>  {
>  	struct nvkm_pci *pci = nvkm_pci(subdev);
>  
> -	if (pci->irq >= 0) {
> -		free_irq(pci->irq, pci);
> -		pci->irq = -1;
> -	}
> -
>  	if (pci->agp.bridge)
>  		nvkm_agp_fini(pci);
>  
> @@ -108,8 +107,20 @@ static int
>  nvkm_pci_oneinit(struct nvkm_subdev *subdev)
>  {
>  	struct nvkm_pci *pci = nvkm_pci(subdev);
> -	if (pci_is_pcie(pci->pdev))
> -		return nvkm_pcie_oneinit(pci);
> +	struct pci_dev *pdev = pci->pdev;
> +	int ret;
> +
> +	if (pci_is_pcie(pci->pdev)) {
> +		ret = nvkm_pcie_oneinit(pci);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = request_irq(pdev->irq, nvkm_pci_intr, IRQF_SHARED, "nvkm",
> pci);
> +	if (ret)
> +		return ret;
> +
> +	pci->irq = pdev->irq;
>  	return 0;
>  }
>  
> @@ -117,7 +128,6 @@ static int
>  nvkm_pci_init(struct nvkm_subdev *subdev)
>  {
>  	struct nvkm_pci *pci = nvkm_pci(subdev);
> -	struct pci_dev *pdev = pci->pdev;
>  	int ret;
>  
>  	if (pci->agp.bridge) {
> @@ -131,28 +141,34 @@ nvkm_pci_init(struct nvkm_subdev *subdev)
>  	if (pci->func->init)
>  		pci->func->init(pci);
>  
> -	ret = request_irq(pdev->irq, nvkm_pci_intr, IRQF_SHARED, "nvkm",
> pci);
> -	if (ret)
> -		return ret;
> -
> -	pci->irq = pdev->irq;
> -
>  	/* Ensure MSI interrupts are armed, for the case where there are
>  	 * already interrupts pending (for whatever reason) at load time.
>  	 */
>  	if (pci->msi)
>  		pci->func->msi_rearm(pci);
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static void *
>  nvkm_pci_dtor(struct nvkm_subdev *subdev)
>  {
>  	struct nvkm_pci *pci = nvkm_pci(subdev);
> +
>  	nvkm_agp_dtor(pci);
> +
> +	if (pci->irq >= 0) {
> +		/* freq_irq() will call the handler, we use pci->irq == -1
> +		 * to signal that it's been torn down and should be a noop.
> +		 */
> +		int irq = pci->irq;
> +		pci->irq = -1;
> +		free_irq(irq, pci);
> +	}
> +
>  	if (pci->msi)
>  		pci_disable_msi(pci->pdev);
> +
>  	return nvkm_pci(subdev);
>  }
>  
> 
-- 
Cheers,
	Lyude Paul