Re: [PATCH 7/9] mmc: core: Allow CMD13 polling when switch to HS200 mode

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Fri, 18 Nov 2016 14:32:36 +0200

On 18/11/16 14:20, Ulf Hansson wrote:
> On 18 November 2016 at 10:30, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>> On 17/11/16 17:02, Ulf Hansson wrote:
>>> On 17 November 2016 at 11:23, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>>>> On 16/11/16 12:51, Ulf Hansson wrote:
>>>>> In cases when the mmc host doesn't support HW busy detection, polling for
>>>>> busy by using CMD13 is beneficial. The reasons have already been explained
>>>>> in earlier change logs.
>>>>>
>>>>> To allow polling with CMD13, let's provide MMC_TIMING_MMC_HS200 as the
>>>>> timing parameter to __mmc_switch(), which makes sure the mmc host and the
>>>>> mmc card operates at the same bus timing during the polling.
>>>>
>>>> I have reports of cases where CMD13 always gives CRC errors after switch
>>>> to HS200.  Currently we are assuming the low frequency should mean that
>>>> won't happen, but it does in some cases.  That is not entirely surprising
>>>> since HS200 needs tuning at the final operating frequency.
>>>
>>> >From a logical point of view and if tuning is needed also for the CMD
>>> line, this somehow make sense.
>>>
>>> However, this is *not* how the JEDEC spec describes the HS200 switch
>>> sequence. It is clearly stated that the host should validate the CM6
>>> status via sending a CMD13 command, *before* performing tuning.
>>
>> I agree, it seems not to be following spec.
>>
>>>
>>> Could it be that the observations about the CRC errors, is related to
>>> a controller/driver issue and not a card issue?
>>
>> I don't know what causes the problem (and I have a sneaking suspicion that
>> if vendors configured / designed their boards correctly, it wouldn't
>> happen).  However, while some cards have better signal characteristics than
>> others, tuning is a host controller issue - the card doesn't care.
>>
>>>
>>>>
>>>> What I would like to do for hosts that support busy waiting or DAT0 polling
>>>> (i.e. MMC_CAP_WAIT_WHILE_BUSY or host->ops->card_busy), is to ignore CRC
>>>> errors from the CMD13 that checks the switch status.  The main reason for
>>>> doing that is that we really expect the switch to succeed and, given HS200
>>>> tuning requirement, the CRC error is not a reliable means of determining
>>>> that it hasn't.
>>>
>>> Hmm. So what you are saying is that CMD13 polling for HS200 doesn't
>>> work, as tuning is needed.
>>
>> I would assume that vendors integrate a working combination of eMMC and host
>> controller, so if polling is the only option, then we could assume it will work.
>>
>>>
>>> So, to me that means we need to fall-back to use the generic CMD6
>>> timeout instead (when HW busy detection isn't supported).
>>
>> Or, in the ignore_crc/retry_err_crc case, return -EILSEQ instead -ETIMEOUT,
>> and catch and ignore the error in the calling code.  Then you get polling if
>> it works, otherwise getting CRC errors until timeout.
>>
>>>
>>>>
>>>> With the existing code I would just change the err check:
>>>>
>>>>
>>>> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
>>>> index 3268fcd3378d..c8862c58b60b 100644
>>>> --- a/drivers/mmc/core/mmc.c
>>>> +++ b/drivers/mmc/core/mmc.c
>>>> @@ -1387,6 +1387,13 @@ static int mmc_select_hs200(struct mmc_card *card)
>>>>
>>>>                 err = mmc_switch_status(card);
>>>>                 /*
>>>> +                * For HS200, CRC errors are not a reliable way to know the
>>>> +                * switch failed. If there really is a problem, we would expect
>>>> +                * tuning will fail and the result ends up the same.
>>>> +                */
>>>> +               if (err == -EILSEQ)
>>>> +                       err = 0;
>>>> +               /*
>>>
>>> I don't think ignoring CRC errors is reliable when verifying the CMD6
>>> status. My point is that we must not parse the status, in case of CRC
>>> errors as it can't be trusted.
>>
>> I agree, but mmc_switch_status() doesn't look at the response if there is an
>> error.
> 
> Correct, it's only during CMD13 polling when CRC was ignored.
> 
>>
>>>
>>> So, then we might as well just ignore validating the CMD6 status
>>> altogether, but instead always move on to the tuning and hope that it
>>> succeeds.
>>
>> That is a possibility, but it seemed to me that is was worth checking for
>> all the users where it does work. i.e if CMD13 does not give a CRC error
>> then validate the response, and if CMD13 does give a CRC error then ignore
>> the response and keep going anyway.
> 
> Okay, let me think about this.
> 
>>
>>>
>>> I think the CMD21 (tuning) should set the ILLEGAL COMMAND if HS200
>>> mode isn't enabled, so we could check that. Anyway, we should fail
>>> with the tuning if the earlier HS200 switch also failed. Don't you
>>> think?
>>
>> Yes CMD21 is an illegal command if the mode is not HS200.  The card should
>> set ILLEGAL_COMMAND but also not respond i.e there will be a timeout error.
>> That could cause a long delay before tuning finally fails.  The only way to
>> mitigate that would be to make ignoring the CRC error a host-specific option
>> (e.g. MMC_CAP_... flag).  Arguably, if the switch fails, the mode is broken
>> and should not have been allowed in the first place.
> 
> Not sure why there should be a long delay?
> 
> If the CMD21 fails with a timeout, it's like any other command that
> fails with a timeout, right?

Ah, right.  I was thinking of the data timeout, but yes the command timeout
will kick in first of course.

> 
> So why should this one take longer to report for the host compared to others?
> 
>>
>>>
>>>>                  * mmc_select_timing() assumes timing has not changed if
>>>>                  * it is a switch error.
>>>>                  */
>>>>
>>>>
>>>> Then to support polling:
>>>>
>>>>
>>>> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
>>>> index c8862c58b60b..66d8d57ae2fb 100644
>>>> --- a/drivers/mmc/core/mmc.c
>>>> +++ b/drivers/mmc/core/mmc.c
>>>> @@ -1352,6 +1352,7 @@ static int mmc_select_hs200(struct mmc_card *card)
>>>>  {
>>>>         struct mmc_host *host = card->host;
>>>>         unsigned int old_timing, old_signal_voltage;
>>>> +       bool send_status;
>>>>         int err = -EINVAL;
>>>>         u8 val;
>>>>
>>>> @@ -1373,18 +1374,20 @@ static int mmc_select_hs200(struct mmc_card *card)
>>>>          * switch to HS200 mode if bus width is set successfully.
>>>>          */
>>>>         err = mmc_select_bus_width(card);
>>>> -       if (err > 0) {
>>>> -               val = EXT_CSD_TIMING_HS200 |
>>>> -                     card->drive_strength << EXT_CSD_DRV_STR_SHIFT;
>>>> -               err = __mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
>>>> -                                  EXT_CSD_HS_TIMING, val,
>>>> -                                  card->ext_csd.generic_cmd6_time, 0,
>>>> -                                  true, false, true);
>>>> -               if (err)
>>>> -                       goto err;
>>>> -               old_timing = host->ios.timing;
>>>> -               mmc_set_timing(host, MMC_TIMING_MMC_HS200);
>>>> +       if (err <= 0)
>>>> +               goto err;
>>>> +
>>>> +       send_status = !(host->caps & MMC_CAP_WAIT_WHILE_BUSY) &&
>>>> +                     !host->ops->card_busy;
>>>> +       old_timing = host->ios.timing;
>>>> +
>>>> +       val = EXT_CSD_TIMING_HS200 |
>>>> +             card->drive_strength << EXT_CSD_DRV_STR_SHIFT;
>>>> +       err = __mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_HS_TIMING, val,
>>>> +                          card->ext_csd.generic_cmd6_time,
>>>> +                          MMC_TIMING_MMC_HS200, true, send_status, true);
>>>>
>>>> +       if (!err && !send_status) {
>>>>                 err = mmc_switch_status(card);
>>>>                 /*
>>>>                  * For HS200, CRC errors are not a reliable way to know the
>>>>
>>>>
>>>>
>>>> Thoughts?
>>>
>>> Well, I think the main problem is that if we have cards that returns
>>> CRC errors even after the HS200 switch, then we can't use polling, as
>>> we can't trust to parse the CMD6 status.
>>
>> As I wrote above, if there is no option but polling then we could expect it
>> to work.  And if CMD13 does not give a CRC error then we can validate the
>> response, only ignoring it if there is a CRC error.
>>
>> I should point out that retrying CMD13 will clear the error bits in the
>> status so there is no point retrying when checking for the SWITCH_ERROR bit.
>> i.e. we need a version of __switch_send_status() that sets retries to zero.
> 
> Are you really sure about this?

I don't think I have ever tested it.  I was going on what the spec says:
"1) Error bit. Signals an error condition was detected by the device. These
bits are cleared as soon as the response (reporting the error) is sent out."

> 
> I thought the switch status remained present in the device, but got
> cleared first when a new CMD6 command is being sent (or a reset of
> course), that would make more sense to me. :-)
> Anyway, this would mean that old CMD13 polling method was broken even
> in this sense.
> 
> Okay, some more tests seems to be needed here. I will do some local
> hacks to explore this.
> 
> Kind regards
> Uffe
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html