RE: [PATCH 2/3] mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Russell,

> -----Original Message-----
> From: Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx>
> Sent: Monday, September 23, 2019 8:30 PM
> To: Y.b. Lu <yangbo.lu@xxxxxxx>
> Cc: Robin Murphy <robin.murphy@xxxxxxx>; dann frazier
> <dann.frazier@xxxxxxxxxxxxx>; Will Deacon <will.deacon@xxxxxxx>; Nicolin
> Chen <nicoleotsuka@xxxxxxxxx>; Christoph Hellwig <hch@xxxxxx>; Meenakshi
> Aggarwal <meenakshi.aggarwal@xxxxxxx>; Adrian Hunter
> <adrian.hunter@xxxxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>;
> linux-mmc@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 2/3] mmc: sdhci-of-esdhc: set DMA snooping based on
> DMA coherence
> 
> On Mon, Sep 23, 2019 at 12:10:16PM +0100, Russell King - ARM Linux admin
> wrote:
> > On Mon, Sep 23, 2019 at 09:41:34AM +0100, Russell King - ARM Linux
> admin wrote:
> > > On Mon, Sep 23, 2019 at 02:51:15AM +0000, Y.b. Lu wrote:
> > > > Hi Russell,
> > > >
> > > > > -----Original Message-----
> > > > > From: Russell King <rmk@xxxxxxxxxxxxxxx> On Behalf Of Russell
> > > > > King
> > > > > Sent: Sunday, September 22, 2019 6:27 PM
> > > > > To: Robin Murphy <robin.murphy@xxxxxxx>; dann frazier
> > > > > <dann.frazier@xxxxxxxxxxxxx>; Will Deacon <will.deacon@xxxxxxx>;
> > > > > Nicolin Chen <nicoleotsuka@xxxxxxxxx>; Y.b. Lu
> > > > > <yangbo.lu@xxxxxxx>; Christoph Hellwig <hch@xxxxxx>
> > > > > Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>; Ulf Hansson
> > > > > <ulf.hansson@xxxxxxxxxx>; linux-mmc@xxxxxxxxxxxxxxx
> > > > > Subject: [PATCH 2/3] mmc: sdhci-of-esdhc: set DMA snooping based
> > > > > on DMA coherence
> > > > >
> > > > > We must not unconditionally set the DMA snoop bit; if the DMA
> > > > > API is assuming that the device is not DMA coherent, and the
> > > > > device snoops the CPU caches, the device can see stale cache lines
> brought in by speculative prefetch.
> > > > >
> > > > > This leads to the device seeing stale data, potentially
> > > > > resulting in corrupted data transfers.  Commonly, this results in a
> descriptor fetch error such as:
> > > > >
> > > > > mmc0: ADMA error
> > > > > mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> > > > > mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
> > > > > mmc0: sdhci: Blk size:  0x00000008 | Blk cnt:  0x00000001
> > > > > mmc0: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
> > > > > mmc0: sdhci: Present:   0x01f50008 | Host ctl: 0x00000038
> > > > > mmc0: sdhci: Power:     0x00000003 | Blk gap:  0x00000000
> > > > > mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x000040d8
> > > > > mmc0: sdhci: Timeout:   0x00000003 | Int stat: 0x00000001
> > > > > mmc0: sdhci: Int enab:  0x037f108f | Sig enab: 0x037f108b
> > > > > mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00002202
> > > > > mmc0: sdhci: Caps:      0x35fa0000 | Caps_1:   0x0000af00
> > > > > mmc0: sdhci: Cmd:       0x0000333a | Max curr: 0x00000000
> > > > > mmc0: sdhci: Resp[0]:   0x00000920 | Resp[1]:  0x001d8a33
> > > > > mmc0: sdhci: Resp[2]:   0x325b5900 | Resp[3]:  0x3f400e00
> > > > > mmc0: sdhci: Host ctl2: 0x00000000
> > > > > mmc0: sdhci: ADMA Err:  0x00000009 | ADMA Ptr:
> > > > > 0x000000236d43820c
> > > > > mmc0: sdhci: ============================================
> > > > > mmc0: error -5 whilst initialising SD card
> > > > >
> > > > > but can lead to other errors, and potentially direct the SDHCI
> > > > > controller to read/write data to other memory locations (e.g. if
> > > > > a valid descriptor is visible to the device in a stale cache
> > > > > line.)
> > > > >
> > > > > Fix this by ensuring that the DMA snoop bit corresponds with the
> > > > > behaviour of the DMA API.  Since the driver currently only
> > > > > supports DT, use of_dma_is_coherent().  Note that
> > > > > device_get_dma_attr() can not be used as that risks re-introducing this
> bug if/when the driver is converted to ACPI.
> > > >
> > > > [Y.b. Lu] May I have your suggestion how to check this condition with
> ACPI?
> > > > Although we didn’t support APCI on upstream for this driver now, as I
> know we had already had ACPI used internally and would be upstreamed.
> > > > Thanks.
> > >
> > > The short answer is, I don't believe there is an equivalent for ACPI.
> > > However, it's a question for ACPI people, not me, as I have little
> > > knowedge about ACPI.
> > >
> > > From what I've looked at so far, for ACPI, the decision used for
> > > dev->dma_coherent() / dev_is_dma_coherent() (which controls the DMA
> > > API behaviour) is not available - this is a decision by the SMMU code.
> > >
> > > See:
> > > 	drivers/acpi/arm64/iort.c:arm_smmu_dma_configure()
> > > 	drivers/acpi/scan.c:acpi_dma_configure()
> > > 	arch/arm64/mm/dma-mapping.c:arch_setup_dma_ops()
> > >
> > > The decision is made whether the SMMU is in coherent walk mode.
> > >
> > > The proposed interface to use is device_get_dma_attr().  For OF,
> > > this works as expected - it mirrors of_dma_is_coherent().  However,
> > > for ACPI it does not - under ACPI, it uses acpi_get_dma_attr().
> > >
> > > See:
> > > 	drivers/acpi/scan.c:acpi_get_dma_attr()
> > >
> > > This can return one of three states.  Let's concentrate on the
> > > coherent/non-coherent states.  This depends on
> > > adev->flags.coherent_dma.  This gets set by acpi_init_coherency(),
> > > which looks up the _CCA property in the device handle.
> > >
> > > It looks to me like it is entirely possible for ACPI to say that,
> > > for example, the SMMU is coherent, which will cause
> > > dev->dma_coherent to be set, but the device (and it's parents) not
> > > to have _CCA indicating that the device is coherent.
> > >
> > > It seems that the only way this could be guaranteed is if the ESDHC
> > > was a child of the SMMU - but I don't know whether that is the case
> > > or not.  If it isn't, using device_get_dma_attr() will result in a
> > > coherency disagreement between the DMA API and the device, which
> > > will lead to ADMA errors and potential data corruption.
> > >
> > > There may be some subtle reason this can't happen, but from merely
> > > looking at the code, I'd say using device_get_dma_attr() here would
> > > be dangerous.
> >
> > The best I can come up with is something like:
> >
> > 	enum dev_dma_attr attr, expected;
> >
> > 	attr = device_get_dma_attr(dev);
> > 	expected = dev_is_dma_coherent(dev) ? DEV_DMA_COHERENT :
> > 			DEV_DMA_NON_COHERENT;
> >
> > 	WARN_ON(attr != expected);
> >
> > Maybe even aborting the probe if they don't agree.  However, it seems
> > dev_is_dma_coherent() is an IOMMU/DMA API implementation detail that
> > only IOMMU drivers and arch code are supposed to use.
> >
> > What I'm hearing at the moment from Jon Nettleton is that the NXP ACPI
> > description does not include an IORT table and doesn't mention the
> > SMMU.  If I'm reading the code correctly, that means the Linux DMA API
> > will assume all devices are DMA non-coherent - and if we use
> > device_get_dma_attr() for this, and the ACPI device description has
> > _CCA=1, we'll end up with potentially data corrupting mismatched DMA
> > coherency expectations.
> 
> Okay, so digging in to the edk2-platforms code at:
> 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsource.
> codeaurora.org%2Fexternal%2Fqoriq%2Fqoriq-components%2Fedk2-platform
> s%2F&amp;data=02%7C01%7Cyangbo.lu%40nxp.com%7Cb080efc2fa564fb98
> e2108d74021b99e%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
> 637048385907777184&amp;sdata=5MQWxAdNHvMwQS21QaUzKjMZGTUUl
> DP7GxUKN1mEpSw%3D&amp;reserved=0
> 
> I find this:
> 
> 20 Scope(_SB)
> 21 {
> 22   Device(SDC0) {
> 23     Name(_HID, "NXP0003")
> 24     Name(_CID, "PNP0D40")
> 25     Name(_CCA, 0)
> ...
> 50   Device(SDC1) {
> 51     Name(_HID, "NXP0003")
> 52     Name(_CID, "PNP0D40")
> 53     Name(_CCA, 0)
> 
> So, ACPI describes these devices as DMA _non-coherent_.
> 
> The AcpiTables.inf file has:
> 
> 35  /* Iort.asl
> ...
> 
> So, my understanding is that as there is no IORT, the DMA API will consider
> ACPI devices to be non-coherent, and will do cache maintenance and for
> dma_alloc_coherent(), will remap memory uncached.
> 
> So, it seems that given the current ACPI description, DMA snooping must
> *not* be enabled for the eSDHC controller.
> 
> Checking the SATA situation:
> 
> 20 Scope(_SB)
> 21 {
> 22   Device(SAT0) {
> 23     Name(_HID, "NXP0004")
> 24     Name(_CCA, 1)
> 
> So, ACPI is saying that the device _is_ DMA coherent, but given the lack of
> IORT, the DMA API will treat this as DMA non-coherent.  I don't have the
> knowledge to know what the SATA driver does, whether the SATA hardware is
> coherent or not, or whether it contains a bit to control it, but this basically
> looks wrong to me given what I understand so far.
> 
> From what I can gather, _CCA must not be explicitly set to 1 or allowed to
> default to 1 on ARM64 for any ACPI device without an IORT table being
> present to tell the DMA API that the ACPI devices are coherent.
> 

[Y.b. Lu] Thank you very much for your fix-up and the analysis on the ACPI. That's very helpful for us.
I know little about ACPI either but I had added Meenakshi who worked on the ACPI in case she wants to know it.

Thanks again :)

> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ar
> mlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=02%7C01%7Cyangbo.l
> u%40nxp.com%7Cb080efc2fa564fb98e2108d74021b99e%7C686ea1d3bc2b4
> c6fa92cd99c5c301635%7C0%7C0%7C637048385907777184&amp;sdata=Dtk
> M9lpicCv%2BYXdQyX6uIWsLKZBItJEwwt0ZKdyYsZo%3D&amp;reserved=0
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps
> up According to speedtest.net: 11.9Mbps down 500kbps up




[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux