Hello, On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote: > I agree. The whole > > page = alloc_pages_node(nid, alloc_flags, order); > if (!page) > continue; > if (!order) > break; > if (!PageCompound(page)) { > split_page(page, order); > break; > } else if (!split_huge_page(page)) { > break; > } > > looks very suspicious to me. > My wild guess is that gfp flags changed somewhere above, so we hit > the branch which was never hit before. Right to be suspicious about the above: split_huge_page on a regular page allocated by a driver was never meant to work. The PageLocked BUG_ON is just a symptom of a bigger issue, basically split_huge_page it may survive, but it'll stay compound and in turn it must be freed as compound. The respective free method doesn't even contemplate freeing compound pages, the only way the free method can survive, is by removing __GFP_COMP forcefully in the allocation that was perhaps set here (there are that many __GFP_COMP in that directory): static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size) { gfp_t gfp_flags; gfp_flags = GFP_KERNEL | __GFP_COMP /* compound page lets parts be mapped */ And I'm not sure what the comment means here, compound or non compound doesn't make a difference when you map it, it's not a THP, the mappings must be handled manually so nothing should check PG_compound anyway in the mapping code. Something like this may improve things, it's an untested quick hack, but this assumes it's always a bug to setup a compound page for these DMA allocations and given the API it's probably a correct assumption.. Compound is slower, unless you need it, you can avoid it and then split_page will give contiguous memory page granular. Ideally the code shouldn't call split_page at all and it should free it all at once by keeping track of the order and by returning the order to the caller, something the API can't do right now as it returns a plain array that can only represent individual small pages. Once this is resolved, you may want to check your config, iommu passthrough sounds more optimal for a soundcard. diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index f68a62c3c32b..3dfbc010fa83 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, /* IOMMU can map any pages, so himem can also be used here */ gfp |= __GFP_NOWARN | __GFP_HIGHMEM; + if (unlikely(gfp & __GFP_COMP)) { + WARN(); + gfp &= ~__GFP_COMP; + } while (count) { struct page *page = NULL; @@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, continue; if (!order) break; - if (!PageCompound(page)) { - split_page(page, order); - break; - } else if (!split_huge_page(page)) { - break; - } - __free_pages(page, order); + split_page(page, order); + break; } if (!page) { __iommu_dma_free_pages(pages, i); diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index 6850d13aa98c..378f5a36ec5f 100644 --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size) gfp_t gfp_flags; gfp_flags = GFP_KERNEL - | __GFP_COMP /* compound page lets parts be mapped */ | __GFP_NORETRY /* don't trigger OOM-killer */ | __GFP_NOWARN; /* no stack trace print - this call is non-critical */ dmab->area = dma_alloc_coherent(dmab->dev.dev, size, &dmab->addr,