Re: [PATCH v2] cx23885: only reset DMA on problematic CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex,


On 19/12/2018 11.26, Brad Love wrote:
> Hi Alex,
>
>
> On 18/12/2018 17.49, Alex Deucher wrote:
>> On Tue, Dec 18, 2018 at 5:59 PM Brad Love <brad@xxxxxxxxxxxxxxxx> wrote:
>>> It is reported that commit 95f408bbc4e4 ("media: cx23885: Ryzen DMA
>>> related RiSC engine stall fixes") caused regresssions with other CPUs.
>>>
>>> Ensure that the quirk will be applied only for the CPUs that
>>> are known to cause problems.
>>>
>>> A module option is added for explicit control of the behaviour.
>>>
>>> Fixes: 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes")
>>>
>>> Signed-off-by: Brad Love <brad@xxxxxxxxxxxxxxxx>
>>> ---
>>> Changes since v1:
>>> - Added module option for three way control
>>> - Removed '7' from pci id description, Ryzen 3 is the same id
>>>
>>>  drivers/media/pci/cx23885/cx23885-core.c | 54 ++++++++++++++++++++++++++++++--
>>>  drivers/media/pci/cx23885/cx23885.h      |  2 ++
>>>  2 files changed, 54 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/media/pci/cx23885/cx23885-core.c b/drivers/media/pci/cx23885/cx23885-core.c
>>> index 39804d8..fb721c7 100644
>>> --- a/drivers/media/pci/cx23885/cx23885-core.c
>>> +++ b/drivers/media/pci/cx23885/cx23885-core.c
>>> @@ -23,6 +23,7 @@
>>>  #include <linux/moduleparam.h>
>>>  #include <linux/kmod.h>
>>>  #include <linux/kernel.h>
>>> +#include <linux/pci.h>
>>>  #include <linux/slab.h>
>>>  #include <linux/interrupt.h>
>>>  #include <linux/delay.h>
>>> @@ -41,6 +42,18 @@ MODULE_AUTHOR("Steven Toth <stoth@xxxxxxxxxxx>");
>>>  MODULE_LICENSE("GPL");
>>>  MODULE_VERSION(CX23885_VERSION);
>>>
>>> +/*
>>> + * Some platforms have been found to require periodic resetting of the DMA
>>> + * engine. Ryzen and XEON platforms are known to be affected. The symptom
>>> + * encountered is "mpeg risc op code error". Only Ryzen platforms employ
>>> + * this workaround if the option equals 1. The workaround can be explicitly
>>> + * disabled for all platforms by setting to 0, the workaround can be forced
>>> + * on for any platform by setting to 2.
>>> + */
>>> +static unsigned int dma_reset_workaround = 1;
>>> +module_param(dma_reset_workaround, int, 0644);
>>> +MODULE_PARM_DESC(dma_reset_workaround, "periodic RiSC dma engine reset; 0-force disable, 1-driver detect (default), 2-force enable");
>>> +
>>>  static unsigned int debug;
>>>  module_param(debug, int, 0644);
>>>  MODULE_PARM_DESC(debug, "enable debug messages");
>>> @@ -603,8 +616,13 @@ static void cx23885_risc_disasm(struct cx23885_tsport *port,
>>>
>>>  static void cx23885_clear_bridge_error(struct cx23885_dev *dev)
>>>  {
>>> -       uint32_t reg1_val = cx_read(TC_REQ); /* read-only */
>>> -       uint32_t reg2_val = cx_read(TC_REQ_SET);
>>> +       uint32_t reg1_val, reg2_val;
>>> +
>>> +       if (!dev->need_dma_reset)
>>> +               return;
>>> +
>>> +       reg1_val = cx_read(TC_REQ); /* read-only */
>>> +       reg2_val = cx_read(TC_REQ_SET);
>>>
>>>         if (reg1_val && reg2_val) {
>>>                 cx_write(TC_REQ, reg1_val);
>>> @@ -2058,6 +2076,36 @@ void cx23885_gpio_enable(struct cx23885_dev *dev, u32 mask, int asoutput)
>>>         /* TODO: 23-19 */
>>>  }
>>>
>>> +static struct {
>>> +       int vendor, dev;
>>> +} const broken_dev_id[] = {
>>> +       /* According with
>>> +        * https://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen%207%201800X/lspci,
>>> +        * 0x1451 is PCI ID for the IOMMU found on Ryzen
>>> +        */
>>> +       { PCI_VENDOR_ID_AMD, 0x1451 },
>> Does this issue only happen with the IOMMU is enabled?  Is it only for
>> p2p transfers?  Until recently the DMA and PCI subsystems didn't
>> actually support p2p properly when the IOMMU was enabled.  that might
>> explain some of the issues.  Additionally, if you match based on the
>> IOMMU id, you won't match if the user disables the IOMMU in the sbios.
>> Is this only an issue with the IOMMU enabled?
>>
>> Alex
>
> I'm unsure of the answers to your questions. I do still have my Ryzen3
> system around, I'll see if I can disable IOMMU and do some tests.
>
> Regards,
>
> Brad


The moment I looked this up I recalled something. During testing I had
to pass iommu=pt as a kernel command line option, that option is still
in my grub config. Without that set I would get critical AMD-VI errors
from the onboard ethernet as well as gpu. I have left that setting as is
the entire time (~8mo) I've been testing the system, because I could not
boot initially without it. I'll try other options now.

Regards,

Brad




>
>
>>> +};
>>> +
>>> +static bool cx23885_does_need_dma_reset(void)
>>> +{
>>> +       int i;
>>> +       struct pci_dev *pdev = NULL;
>>> +
>>> +       if (dma_reset_workaround == 0)
>>> +               return false;
>>> +       else if (dma_reset_workaround == 2)
>>> +               return true;
>>> +
>>> +       for (i = 0; i < sizeof(broken_dev_id); i++) {
>>> +               pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
>>> +               if (pdev) {
>>> +                       pci_dev_put(pdev);
>>> +                       return true;
>>> +               }
>>> +       }
>>> +       return false;
>>> +}
>>> +
>>>  static int cx23885_initdev(struct pci_dev *pci_dev,
>>>                            const struct pci_device_id *pci_id)
>>>  {
>>> @@ -2069,6 +2117,8 @@ static int cx23885_initdev(struct pci_dev *pci_dev,
>>>         if (NULL == dev)
>>>                 return -ENOMEM;
>>>
>>> +       dev->need_dma_reset = cx23885_does_need_dma_reset();
>>> +
>>>         err = v4l2_device_register(&pci_dev->dev, &dev->v4l2_dev);
>>>         if (err < 0)
>>>                 goto fail_free;
>>> diff --git a/drivers/media/pci/cx23885/cx23885.h b/drivers/media/pci/cx23885/cx23885.h
>>> index d54c7ee..cf965ef 100644
>>> --- a/drivers/media/pci/cx23885/cx23885.h
>>> +++ b/drivers/media/pci/cx23885/cx23885.h
>>> @@ -451,6 +451,8 @@ struct cx23885_dev {
>>>         /* Analog raw audio */
>>>         struct cx23885_audio_dev   *audio_dev;
>>>
>>> +       /* Does the system require periodic DMA resets? */
>>> +       unsigned int            need_dma_reset:1;
>>>  };
>>>
>>>  static inline struct cx23885_dev *to_cx23885(struct v4l2_device *v4l2_dev)
>>> --
>>> 2.7.4
>>>





[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux