Re: [PATCH v4 00/25] MMC/SDHCI fixes

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Thu, 4 Feb 2016 11:55:09 +0100

+ Adrian

On 29 January 2016 at 10:43, Russell King - ARM Linux
<linux@xxxxxxxxxxxxxxxx> wrote:
> On Mon, Dec 21, 2015 at 11:39:56AM +0000, Russell King - ARM Linux wrote:
>> This all started with a report from an iMX6 Hummingboard user that
>> they were seeing regular "mmc0: Got data interrupt 0x00000002 even
>> though no data operation was in progress." messages with 4.4-rc and
>> additional patches to enable UHS support in DT for the platform.
>>
>> When I tested 4.4-rc on an iMX6 Hummingboard with UHS support enabled,
>> I found several other issues.  This patch series addresses some of
>> the issues.
>>
>> However, while testing on Armada 388, an issue was found there, and
>> a fix is also included in this patch set, however it does not depend
>> on the rest of the patch series.
>>
>> We all know that the SDHCI driver is in poor state, and Ulf's desire
>> to see it turned into a library (which actually comes from myself.)
>> However, with the driver in such a poor state, it makes sense to fix
>> a few of the issues first, especially if they result in the code
>> shrinking - which they do.
>>
>> However, given what I've found below, I have to wonder how many of the
>> quirks that we have in this driver are down to the poor code rather than
>> real hardware problems: how many of them are due to poorly investigated
>> bugs.
>>
>> This series starts off by shutting up some kernel messages that should
>> not be verbose:
>> - the voltage-ranges DT property is optional, so we should not print
>>   a message saying that it's missing from DT.  That suggests it isn't
>>   optional.  As no ill effect comes from not specifying it, I've
>>   dropped this message to debug level.
>>
>> - reworking the voltage-ranges code so that callers know whether the
>>   optional voltage-ranges property was specified.
>>
>> - "retrying because a re-tune was needed" - a re-tune is something that
>>   is part of the higher speed SD specs, and it's required to happen
>>   periodically.  So it's expected and normal behaviour.  There's no need
>>   to print this to the kernel log.
>>
>> - When tuning fails, rather than a bland "mmc0: tuning execution failed"
>>   message, print the error code so that we can see what the reason for
>>   the failure is.  MMC fails on this point in many places.
>>
>> - Move the initialisation of the command ->error member.  It looks to
>>   me as if this is not required; mmc_start_request() does this already,
>>   as it seems the rest of the MMC core does too.  Maybe this should be
>>   removed altogether?
>>
>> - Testing error bits, then setting the error, then testing the error
>>   value, and completing the command is a very lengthy way to deal with
>>   errors here.  Test the error bits, and then decide what error occurred
>>   before completing.  (This code is actually buggy, see the next change).
>>
>> - If a command _response_ CRC error occurs, we can't be certain whether
>>   the card received the command successfully, and has started processing
>>   it.  If it has, and it results in data being read from the card, the
>>   card will enter data transfer state, and start sending data to the
>>   host.  Therefore, it is completely wrong to blindly terminate the
>>   command, especially when such termination fails to clean _anything_
>>   up - eg, it leaves DMA in place so the host can continue DMAing to the
>>   memory, and it leaves buffers DMA mapped - which eventually annoys the
>>   DMA API debugging.  Fix this by assuming that a CRC error is a receive
>>   side error.  If the card did not correctly receive the command, it
>>   will not initiate a data transfer, and we should time out after the
>>   card specified timeout.
>>
>>   With the unmodified mainline code, even if we reset the state machines
>>   when finishing a request in error, the _card_ itself may well still be
>>   in data transfer mode when this happens.  How can we be sure what state
>>   the data side is actually in?  Could this be why (eg) sdhci-esdhc-imx.c
>>   has this:
>>
>>         /* FIXME: delay a bit for card to be ready for next tuning due to errors */
>>         mdelay(1);
>>
>>   And how many more such _hacks_ are working around similar problems?
>>
>> - The unaligned buffer handling is a total waste of resources in its
>>   current form.  We DMA map, sync, and unmap it on every transfer,
>>   whether the buffer is used or not.  For MMC/SD transfers, these will
>>   be coming from the VFS/MM layers, which operate through the page
>>   cache - hence the vast majority of buffers will be page aligned.
>>   Performance measurements on iMX6 show that this additional DMA
>>   maintanence is responsible for a 10% reduction in read transfer
>>   performance.  Switch this to use a DMA coherent mapping instead, which
>>   needs no DMA maintanence.
>>
>> - sdhci_adma_table_post() is badly coded: we walk the scatterlist
>>   looking for any buffers which might not be aligned (which can be a
>>   rare event, see above) and then we only do something if we find an
>>   unaligned buffer _and_ it's a read transfer.  So for a write transfer,
>>   this scatterlist walking is a total waste of CPU cycles.  Testing the
>>   transfer direction is much cheaper than walking the scatterlist.
>>   Re-organise the code to do the cheap test first.
>>
>> - There are two paths to identical DMA unmapping code (the code around
>>   dma_unmap_sg()) in sdhci_finish_data() and its child function
>>   sdhci_adma_table_post().  Trivially simplify this duplication.
>>
>> - Move sdhci_pre_dma_transfer() so we avoid needing to declare it.
>>
>> - Do the same thing with sdhci_prepare_data() and sdhci_pre_dma_transfer()
>>   as with sdhci_finish_data() and sdhci_adma_table_post().
>>
>> - Fix the mess that is the data segment host_cookie handling.  The
>>   existing code seems like a candidate for the obfuscated coding prize!
>>   Four patches address this, turning it into something much easier to
>>   understand, which then allows (finally)...
>>
>> - The DMA API leak which occurs when we have mapped the DMA buffers in
>>   sdhci_prepare_data() and an error occurs early on in the processing
>>   of a request.
>>
>> - A change (mentioned above) which fixes a prior commit for Armada 38x
>>   using sdhci-pxav3.c, found while trying to get the higher speed modes
>>   working on these SoCs.  Clearly the existing code is broken, this at
>>   least resolves some of the breakage by allowing the correct high-order
>>   SDHCI capabilities through.
>>
>> - Fixing another DMA API leak with the pre/post request handling, where
>>   a current request can change the state of the SDHCI_REQ_USE_DMA, and
>>   thus stop a mapped request being unmapped.
>>
>> - Fixing the SDHCI timeout calculation code to correctly round timeouts
>>   up rather than down, and correctly calculate the time (in microseconds)
>>   for a certain number of card clocks.
>>
>> - Consolidating the SDHCI address/size alignment handling so we only walk
>>   the scatterlist once instead of twice.
>>
>> Further patches are necessary as there are still a number of observable
>> problems when using this driver on iMX6 with UHS cards.  Patches up to
>> and including "mmc: sdhci: further fix for DMA unmapping in
>> sdhci_post_req()" should be considered for inclusion in the stable
>> kernel trees.
>>
>> I've tested these changes on iMX6 at UHS speeds, and Armada 388 SoCs
>> at HS speeds so far.  Others should test them.
>
> One comment from Adrian addressed.
>
>  drivers/mmc/card/block.c       |   4 +-
>  drivers/mmc/core/core.c        |  17 +-
>  drivers/mmc/host/sdhci-pxav3.c |   6 +-
>  drivers/mmc/host/sdhci.c       | 444 ++++++++++++++++++-----------------------
>  drivers/mmc/host/sdhci.h       |   4 +-
>  5 files changed, 213 insertions(+), 262 deletions(-)
>
> --
> RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.

I have applied patch 1 to 4, for next!

The rest are sdhci patches, which from now on needs acks from Adrian
for me to pick them up.

Thanks and kind regards!
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html