On Fri, 04 Nov 2022 16:42:29 +0100, Kai Vehmanen wrote: > > Hi, > > [ and sorry, I had the bounces address in cc originally, routing > reply to list as well ] > > On Fri, 4 Nov 2022, Takashi Iwai wrote: > > > On Fri, 04 Nov 2022 15:56:50 +0100, Kai Vehmanen wrote: > > > I've been debugging a SOF DSP load fail on VT-D/IOMMU systems [1] and > > > that brought me to these two commits from you: > > > .. but in rare low-memory cases, the dma_alloc_noncontiguous() > > > call will fail in core/memalloc.c and we go to the fallback path. > > > > > > So we go to snd_dma_sg_fallback_alloc(), but here it seems something is > > > missing. The pages are not allocated with DMA mapping API anymore, so > > > IOMMU won't know about the memory and in our case the DSP load will fail > > > to a IOMMU fault. > [...] > > > > Hrm, that's a tough issue. Basically if dma_alloc_noncontiguous() > > fails, it's really out of memory -- at least under the situation where > > IOMMU is enabled. The fallback path was introduced for the case where > > there is no IOMMU and noncontiguous allocation fails from the > > beginning. > > ack. This matches with the bug reports. These happen rarely and on systems > with a lot of memory pressure. We have recently submitted a few > features&fixes that will further reduce these allocs, so it probably is > better outcome to have a plain out of memory error. > > > And, what makes things more complicated is that snd_dma_dev_alloc() > > cannot be used for SG pages when IOMMU is involved. We can't get the > > proper pages for mapping SG from the DMA address in that case; some > > systems may work with virt_to_page() but it's not guranteed to work. > > Aa, ok, so we need to use dma_alloc_noncontiguous() to do this properly > and if it returns ENOMEM, that's what it is then, right. > > > So, I believe there are two issues: > > 1. The firmware allocation fails with dma_alloc_noncontiguous() with > > IOMMU enabled > > > > 2. The helper goes to the fallback path and occasionally it worked, > > although the pages can't be used with IOMMU. > > > > Basically when 1 happens, we should just return an error without going > > to fallback path. But it won't "fix" your case? > > I think an explicit error would be best. The problem now is that the > driver will think the allocation (and mapping to device) is fine and > proceeds to program the hardware to use the address. This will then create > an IOMMU fault down the line that is not so straighforward to recover from > (worst case is that a full device level reset needs to be done). And given > driver doesn't know it got a faulty mapping, it's hard to make the > decision why the fault happened. OK, then what I posted in another mail (it went to nirvana) might work. Attached below again. Takashi -- 8< -- --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -9,6 +9,7 @@ #include <linux/slab.h> #include <linux/mm.h> #include <linux/dma-mapping.h> +#include <linux/dma-map-ops.h> #include <linux/genalloc.h> #include <linux/highmem.h> #include <linux/vmalloc.h> @@ -541,19 +542,20 @@ static void *snd_dma_noncontig_alloc(struct snd_dma_buffer *dmab, size_t size) struct sg_table *sgt; void *p; - sgt = dma_alloc_noncontiguous(dmab->dev.dev, size, dmab->dev.dir, - DEFAULT_GFP, 0); - if (!sgt) { #ifdef CONFIG_SND_DMA_SGBUF + if (!get_dma_ops(dmab->dev.dev)) { if (dmab->dev.type == SNDRV_DMA_TYPE_DEV_WC_SG) dmab->dev.type = SNDRV_DMA_TYPE_DEV_WC_SG_FALLBACK; else dmab->dev.type = SNDRV_DMA_TYPE_DEV_SG_FALLBACK; return snd_dma_sg_fallback_alloc(dmab, size); -#else - return NULL; -#endif } +#endif + + sgt = dma_alloc_noncontiguous(dmab->dev.dev, size, dmab->dev.dir, + DEFAULT_GFP, 0); + if (!sgt) + return NULL; dmab->dev.need_sync = dma_need_sync(dmab->dev.dev, sg_dma_address(sgt->sgl)); @@ -857,7 +859,7 @@ static const struct snd_malloc_ops snd_dma_noncoherent_ops = { /* * Entry points */ -static const struct snd_malloc_ops *dma_ops[] = { +static const struct snd_malloc_ops *snd_dma_ops[] = { [SNDRV_DMA_TYPE_CONTINUOUS] = &snd_dma_continuous_ops, [SNDRV_DMA_TYPE_VMALLOC] = &snd_dma_vmalloc_ops, #ifdef CONFIG_HAS_DMA @@ -883,7 +885,7 @@ static const struct snd_malloc_ops *snd_dma_get_ops(struct snd_dma_buffer *dmab) if (WARN_ON_ONCE(!dmab)) return NULL; if (WARN_ON_ONCE(dmab->dev.type <= SNDRV_DMA_TYPE_UNKNOWN || - dmab->dev.type >= ARRAY_SIZE(dma_ops))) + dmab->dev.type >= ARRAY_SIZE(snd_dma_ops))) return NULL; - return dma_ops[dmab->dev.type]; + return snd_dma_ops[dmab->dev.type]; }