Re: [PATCH] mhi: pci_generic: Ensure device readiness before starting MHI

Loic Poulain <loic.poulain@xxxxxxxxxx> · Thu, 11 Feb 2021 19:28:07 +0100

On Thu, 11 Feb 2021 at 18:55, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
>
> On 2/11/2021 10:47 AM, Loic Poulain wrote:
> > On Thu, 11 Feb 2021 at 18:13, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
> >>
> >> On 2/11/2021 10:08 AM, Loic Poulain wrote:
> >>> The PCI device may have not been bound from cold boot and be in
> >>> undefined state, or simply not yet ready for MHI operations. This
> >>> change ensures that the MHI layer is reset to initial state and
> >>> ready for MHI initialization and power up.
> >>>
> >>> Signed-off-by: Loic Poulain <loic.poulain@xxxxxxxxxx>
> >>> ---
> >>>    drivers/bus/mhi/pci_generic.c | 23 +++++++++++++++++++++++
> >>>    1 file changed, 23 insertions(+)
> >>>
> >>> diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
> >>> index c20f59e..bfa0a1e 100644
> >>> --- a/drivers/bus/mhi/pci_generic.c
> >>> +++ b/drivers/bus/mhi/pci_generic.c
> >>> @@ -17,6 +17,8 @@
> >>>    #include <linux/timer.h>
> >>>    #include <linux/workqueue.h>
> >>>
> >>> +#include "core/internal.h"
> >>> +
> >>>    #define MHI_PCI_DEFAULT_BAR_NUM 0
> >>>
> >>>    #define MHI_POST_RESET_DELAY_MS 500
> >>> @@ -391,6 +393,22 @@ static void health_check(struct timer_list *t)
> >>>        mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
> >>>    }
> >>>
> >>> +static void __mhi_sw_reset(struct mhi_controller *mhi_cntrl)
> >>> +{
> >>> +     unsigned int max_wait_ready = 200;
> >>> +
> >>> +     mhi_pci_write_reg(mhi_cntrl, mhi_cntrl->regs + MHICTRL,
> >>> +                       MHICTRL_RESET_MASK);
> >>> +
> >>> +     while (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_READY) {
> >>> +             if (!max_wait_ready--) {
> >>> +                     dev_warn(mhi_cntrl->cntrl_dev, "Not ready\n");
> >>> +                     break;
> >>> +             }
> >>> +             msleep(50);
> >>> +     }
> >>> +}
> >>> +
> >>>    static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>>    {
> >>>        const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
> >>> @@ -451,6 +469,9 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>>                goto err_unregister;
> >>>        }
> >>>
> >>> +     /* Before starting MHI, ensure device is in good initial state */
> >>> +     __mhi_sw_reset(mhi_cntrl);
> >>> +
> >>>        err = mhi_sync_power_up(mhi_cntrl);
> >>>        if (err) {
> >>>                dev_err(&pdev->dev, "failed to power up MHI controller\n");
> >>> @@ -532,6 +553,8 @@ static void mhi_pci_reset_done(struct pci_dev *pdev)
> >>>                return;
> >>>        }
> >>>
> >>> +     __mhi_sw_reset(mhi_cntrl);
> >>> +
> >>>        err = mhi_sync_power_up(mhi_cntrl);
> >>>        if (err) {
> >>>                dev_err(&pdev->dev, "failed to power up MHI controller\n");
> >>>
> >>
> >> So, I'm curious, how does this actually work?
> >>
> >>   From what I can see, you define SBL images.  If those get loaded by the
> >> PBL, it doesn't happen over MHI.  PBL will not move MHI to ready state,
> >> except in the specific instance of a fatal error.
> >
> > I defined generic SBL images for flashless controller versions, but
> > mine is not, and so it boots directly in mission mode.
> >
> >>
> >> Your above change works if the device comes up straight in mission mode
> >> (AMSS), but if it comes up in PBL, you are going to hit the timeout and
> >> dev_warn() every time.
> >
> > Ok, I thought we should get into MHI ready state, whatever the
> > 'execution environment'... So I definitely need to take that into
> > consideration. thanks.
>
> I could see where you could think that, which is why I commented.  I
> didn't want you to run into issues later, assuming those issues are
> valid to you.
>
> MHI only gets into the ready state via EEs which drive MHI.  PBL
> famously does not drive MHI because PBL is encoded into hardware and
> extremely difficult to fix, so it is generally designed with the mantra
> of "simpler is more reliable".

Ok understood.

>
> Hopefully I didn't throw a wrench in things for you.  Just trying to
> save you some pain later.

Yes, that's perfectly fine and valid, so I'm going to rework that.

Regards,
Loic