Re: Regression after "do not use CMD13 to get status after speed mode switch"

Chaotian Jing <chaotian.jing@xxxxxxxxxxxx> · Thu, 3 Nov 2016 11:39:34 +0800



On Wed, 2016-11-02 at 14:51 +0200, Adrian Hunter wrote:
> On 02/11/16 12:28, Chaotian Jing wrote:
> > On Wed, 2016-11-02 at 10:19 +0200, Adrian Hunter wrote:
> >> On 01/11/16 03:43, Chaotian Jing wrote:
> >>> On Mon, 2016-10-31 at 15:09 +0200, Adrian Hunter wrote:
> >>>> On 27/10/16 13:04, Ulf Hansson wrote:
> >>>>> On 20 October 2016 at 09:06, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> >>>>>> On 20 October 2016 at 04:22, Chaotian Jing <chaotian.jing@xxxxxxxxxxxx> wrote:
> >>>>>>> On Wed, 2016-10-19 at 18:41 +0200, Ulf Hansson wrote:
> >>>>>>>> Adrian, Linus,
> >>>>>>>>
> >>>>>>>> Thanks for looking into this and reporting!
> >>>>>>>>
> >>>>>>>> On 18 October 2016 at 15:23, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> >>>>>>>>> On 18/10/16 11:36, Linus Walleij wrote:
> >>>>>>>>>> On Mon, Oct 17, 2016 at 4:32 PM, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Before this patch the eMMC is detected and all partitions enumerated
> >>>>>>>>>>> immediately, but after the patch it doesn't come up at all, except
> >>>>>>>>>>> sometimes, when it appears minutes (!) after boot, all of a sudden.
> >>>>>>>>>>
> >>>>>>>>>> FYI this is what it looks like when it eventually happens:
> >>>>>>>>>> root@msm8660:/ [  627.710175] mmc0: new high speed MMC card at address 0001
> >>>>>>>>>> [  627.711641] mmcblk0: mmc0:0001 SEM04G 3.69 GiB
> >>>>>>>>>> [  627.715485] mmcblk0boot0: mmc0:0001 SEM04G partition 1 1.00 MiB
> >>>>>>>>>> [  627.736654] mmcblk0boot1: mmc0:0001 SEM04G partition 2 1.00 MiB
> >>>>>>>>>> [  627.747397] mmcblk0rpmb: mmc0:0001 SEM04G partition 3 128 KiB
> >>>>>>>>>> [  627.756326]  mmcblk0: p1 p2 p3 p4 < p5 p6 p7 p8 p9 p10 p11 p12 p13
> >>>>>>>>>> p14 p15 p16 p17 p18 p19 p20 p21 >
> >>>>>>>>>>
> >>>>>>>>>> So after 627 seconds, a bit hard for users to wait this long for their
> >>>>>>>>>> root filesystem.
> >>>>>>>>>
> >>>>>>>>> If the driver does not support busy detection and the eMMC card provides
> >>>>>>>>> zero as the cmd6 generic timeout (which it may especially as cmd6 generic
> >>>>>>>>> timeout wasn't added until eMMCv4.5), then __mmc_switch() defaults to
> >>>>>>>>> waiting 10 minutes i.e.
> >>>>>>>>>
> >>>>>>>>> #define MMC_OPS_TIMEOUT_MS      (10 * 60 * 1000) /* 10 minute timeout */
> >>>>>>>>
> >>>>>>>> Urgh! Yes, I have verified that this is exactly what happens.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> So removal of CMD13 polling for HS mode (as per commit
> >>>>>>>>> 08573eaf1a70104f83fdbee9b84e5be03480e9ed) is going to be a problem for some
> >>>>>>>>> combinations of eMMC cards and host drivers.
> >>>>>>>>
> >>>>>>>> I was looking in the __mmc_switch() function, it's just a pain to walk
> >>>>>>>> trough it :-) So first out I decided to clean it up and factor out the
> >>>>>>>> polling parts. I will post the patches first out tomorrow morning,
> >>>>>>>> running some final test right now.
> >>>>>>>>
> >>>>>>>> Although, that of course doesn't solve our problem. As I see it we
> >>>>>>>> only have a few options here.
> >>>>>>>>
> >>>>>>>> 1) In case when cmd6 generic timeout isn't available, let's assign
> >>>>>>>> another empirically selected value.
> >>>>>>>> 2) Use a specific timeout when switching to HS mode.
> >>>>>>>> 3) Even if we deploy 1 (and 2), perhaps we still should allow polling
> >>>>>>>> with CMD13 for switching to HS mode - unless it causes issues for some
> >>>>>>>> cards/drivers combination?
> >>>>>>>>
> >>>>>>>> BTW, I already tried 2) and it indeed solves the problem, although
> >>>>>>>> depending on the selected timeout, it might delay the card detection
> >>>>>>>> to process.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> I just have a try of switching to HS mode with Hynix EMMC, the first
> >>>>>>> CMD13 gets response of 0x900, but the EMMC is still pull-low DAT0. so
> >>>>>>> that CMD13 cannot indicate current card status in this case.
> >>>>>>
> >>>>>> Thanks for sharing that. Okay, so clearly we have some cards that
> >>>>>> don't supports polling with CMD13 when switching to HS mode.
> >>>>>> One could of course add quirks for these kind of cards and do a fixed
> >>>>>> delay for them, but then to find out which these cards are is going to
> >>>>>> be hard.
> >>>>>>
> >>>>>> It seems like we are left with using a fixed delay. Any ideas of what
> >>>>>> such delay should be? And should we have one specific for switch to
> >>>>>> the various speed modes and a different one that overrides the CMD6
> >>>>>> generic timout, when it doesn't exist?
> >>>>>>
> >>>>>
> >>>>> Replying to my own earlier response, as I believe the problem could
> >>>>> also be related to another old commit, see below.
> >>>>>
> >>>>> commit a27fbf2f067b0cd6f172c8b696b9a44c58bfaa7a
> >>>>> Author: Seungwon Jeon <tgih.jun@xxxxxxxxxxx>
> >>>>> Date:   Wed Sep 4 21:21:05 2013 +0900
> >>>>>
> >>>>>     mmc: add ignorance case for CMD13 CRC error
> >>>>>
> >>>>>     While speed mode is changed, CMD13 cannot be guaranteed.
> >>>>>     According to the spec., it is not recommended to use CMD13
> >>>>>     to check the busy completion of the timing change.
> >>>>>     If CMD13 is used in this case, CRC error must be ignored.
> >>>>>
> >>>>>     Signed-off-by: Seungwon Jeon <tgih.jun@xxxxxxxxxxx>
> >>>>>     Acked-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >>>>>     Signed-off-by: Chris Ball <cjb@xxxxxxxxxx>
> >>>>>
> >>>>>
> >>>>> The intent with this commit was not really correct. We don't want to
> >>>>> ignore CRC errors, but instead we should *re-try* sending CMD13 once
> >>>>> we get a CRC error.
> >>>>>
> >>>>> Unfortunate since this commit, instead we tell the host driver to
> >>>>> *ignore* CRC errors and instead reads the status and returns 0
> >>>>> (indicating success). In the mmc core, in __mmc_switch(), it will thus
> >>>>> parse the status reply, even for a reply that might have been received
> >>>>> with a CRC error. Not good!
> >>>>
> >>>> I agree: ignoring CRC errors and then expecting the status in the response
> >>>> to be correct doesn't make sense.
> >>>>
> >>>> However, it raises the question of what to do if there are always CRC errors
> >>>> e.g. if it only works without CRC errors once the mode and frequency are
> >>>> changed in the host controller.
> >>>>
> >>>>> I am wondering whether this actually is the main problem to why we
> >>>>> think polling isn't working for some cases. And perhaps that was the
> >>>>> original problem Chaotian was trying to solve?
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> Does Chaotian have a real problem since his driver has busy detection anyway?
> >>>
> >>> In fact, I have not encounter CRC errors of CMD13, I have tried several
> >>> eMMC cards, after mode switch, CMD13 will only gets 0x800 response and
> >>> we don't know if card is busy by 0x800 response.
> >>
> >> Does it change to 0x900 when it is not busy?
> >>
> > No, it will not change to 0x900 when it is not busy.
> > 
> >> But anyway the question was: do you have busy detection in your driver?
> >>
> > driver has busy detection ops->card_busy() but seems it's MMC core
> > layer's responsibility to ensure that card is not busy when driver
> > starts to issue commands.
> 
> I tried a card here.  The time between HS switch response and busy
> de-assertion was only 58us i.e. practically instant.  The CMD6 response was
> 0x800 but the subsequent CMD13 response was 0x900.
> 
> How long does it take your failing card to switch to HS?
> 
It depends on EMMC chip type, some are very fast and some are take
several ms. I just test Sandisk-SDIN9D-S2, CMD13 also gets 0x800
response after busy-deassert.


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html