On Tue, 9 Jun 2020, Christoph Hellwig wrote: > > Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing > > the error_code in the page fault path. Debugging with Alex off-thread we > > found that dma_{alloc,free}_from_pool() are not getting called from the > > new code in dma_direct_{alloc,free}_pages() and he has not enabled > > mem_encrypt. > > While DMA_COHERENT_POOL absolutely should not select DMA_NONCOHERENT_MMAP > (and you should send your patch either way), I don't think it is going > to make a difference here, as DMA_NONCOHERENT_MMAP just means we > allows mmaps even for non-coherent devices, and we do not support > non-coherent devices on x86. > We haven't heard yet whether the disabling of DMA_NONCOHERENT_MMAP fixes Aaron's BUG(), and the patch included some other debugging hints that will be printed out in case it didn't, but I'll share what we figured out: In 5.7, his config didn't have DMA_DIRECT_REMAP or DMA_REMAP (it did have GENERIC_ALLOCATOR already). AMD_MEM_ENCRYPT is set. In Linus HEAD, AMD_MEM_ENCRYPT now selects DMA_COHERENT_POOL so it sets the two aforementioned options. We also figured out that dma_should_alloc_from_pool() is always false up until the BUG(). So what else changed? Only the selection of DMA_REMAP and DMA_NONCOHERENT_MMAP. The comment in the Kconfig about setting "an uncached bit in the pagetables" led me to believe it may be related to the splat he's seeing (reserved bit violation). So I suggested dropping DMA_NONCOHERENT_MMAP from his Kconfig for testing purposes. If this option should not implicitly be set for DMA_COHERENT_POOL, then I assume we need yet another Kconfig option since DMA_REMAP selected it before and DMA_COHERENT_POOL selects DMA_REMAP :) So do we want a DMA_REMAP_BUT_NO_DMA_NONCOHERENT_MMAP? Decouple DMA_REMAP from DMA_NONCOHERENT_MMAP and select the latter wherever the former was set (but not DMA_COHERENT_POOL)? Something else?