Re: max_discard anomaly on certain Sandisk eMMC

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Tue, 17 Dec 2013 13:25:24 +0100

On 17 December 2013 12:20, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
> On 17/12/13 12:04, Ulf Hansson wrote:
>> On 17 December 2013 09:17, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>>> On 17/12/13 01:18, Stephen Warren wrote:
>>>> On 12/13/2013 03:43 PM, Stephen Warren wrote:
>>>>> On one of my eMMC devices, I see the following results from calling
>>>>> mmc_do_calc_max_discard() with various parameters:
>>>>>
>>>>> [    3.057263] MMC_DISCARD_ARG max_discard 1
>>>>> [    3.057266] MMC_ERASE_ARG   max_discard 4096
>>>>> [    3.057267] MMC_TRIM_ARG    max_discard 1
>>>>>
>>>>> This causes mmc_calc_max_discard() to return 1, which makes the discard
>>>>> IOCTL extremely slow.
>>>>
>>>> Further investigation shows that if I make a few hacks that essentially
>>>> revert e056a1b5b67b "mmc: queue: let host controllers specify maximum
>>>> discard timeout":
>>>>
>>>> diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
>>>> index 357bbc54fe4b..e66af930d0e3 100644
>>>> --- a/drivers/mmc/card/queue.c
>>>> +++ b/drivers/mmc/card/queue.c
>>>> @@ -167,13 +167,15 @@ static void mmc_queue_setup_discard(struct
>>>> request_queue *q,
>>>>               return;
>>>>
>>>>       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
>>>> -     q->limits.max_discard_sectors = max_discard;
>>>> +     q->limits.max_discard_sectors = UINT_MAX;
>>>>       if (card->erased_byte == 0 && !mmc_can_discard(card))
>>>>               q->limits.discard_zeroes_data = 1;
>>>>       q->limits.discard_granularity = card->pref_erase << 9;
>>>>       /* granularity must not be greater than max. discard */
>>>> +#if 0
>>>>       if (card->pref_erase > max_discard)
>>>>               q->limits.discard_granularity = 0;
>>>> +#endif
>>>>       if (mmc_can_secure_erase_trim(card))
>>>>               queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, q);
>>>>  }
>>>>
>>>> I end up with:
>>>>
>>>> $ cat /sys/.../block/mmcblk1/queue# cat discard_granularity
>>>> 2097152
>>>> $ cat /sys/.../block/mmcblk1/queue# cat discard_max_bytes
>>>> 2199023255040
>>>> $ cat /sys/.../block/mmcblk1/queue# cat discard_zeroes_data
>>>> 1
>>>>
>>>> With those values, mke2fs is fast, and I validated that "blkdiscard"
>>>> works; I filled a large partition with /dev/urandom, executed
>>>> "blkdiscard" on the 4M at the start, and saw zeroes when reading the
>>>> discarded part back.
>>>>
>>>> This implies that the issue is simply the operation of
>>>> mmc_calc_max_discard(), rather than the eMMC device mis-reporting its
>>>> discard abilities, doesn't it?
>>>
>>> No.
>>>
>>> The underlying problem is a combination of:
>>>         a) JEDEC specified very large timeouts for erase operations e.g. can be
>>> minutes for large erases
>>>         b) SDHCI controllers have been implemented with high frequency timeout
>>> clocks which limit the maximum timeout to a few seconds
>>>         c) It is not possible to disable the timeout on SDHCI
>>>
>>> What a) means is that you can get away with much larger erases than you can
>>> specify the timeout for - which is what you have discovered.
>>>
>>> To understand the timeouts, you should manually do the calculations.
>>>
>>> Also note, that using HC Erase Size may help (MMC_CAP2_HC_ERASE_SZ), but
>>> beware of the partitioning implications of changing that.
>>>
>>> The best solution is to change the hardware to use the lowest possible
>>> frequency timeout clock e.g. a 1KHz timeout clock could support timeouts of
>>> up to 36 hours.
>>
>> Don't know the details about the limitations for SDHCI, but I guess
>> similar exists for other controllers as well.
>
> Not necessarily.  For example omap_hsmmc just disables the timeout for erase
> operations.
>

Interesting! :-) Actually, it is disabling the data time out and
keeping the command time out.

>>
>> I do get the impression that we have got a problem in the mmc
>> core/block layer for how erase/trim/discard timeouts are being
>> handled.
>>
>> I don't think the mmc hw-controller should be waiting for the R1B
>> response from the CMD38 as long as this "timeout" we are discussing
>> here. According to the spec, at least how I interpret it, the card
>> should respond rather quickly to CMD38, then it will assert the DAT0
>> line to indicate busy.
>>
>> The total time the card is allowed to stay busy, that is what the
>> timeout specifies. We may either use a mmc hw-controller busy
>> detection mechanism or send CMD13 to poll for status. The latter is
>> somewhat already being handled in mmc_do_erase(), but we are using
>> "MMC_CORE_TIMEOUT_MS" instead of the correct timeout.
>>
>> Kind regards
>> Ulf Hansson
>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html