On Wed, 20 May 2020, Greg Kroah-Hartman wrote: > On Tue, May 19, 2020 at 05:41:15PM -0700, David Rientjes wrote: > > Hi Greg and everyone, > > > > On all kernels, SEV enabled guests hit might_sleep() warnings when a > > driver (nvme in this case) allocates through the DMA API in a > > non-blockable context: > > > > BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio > > 2 locks held by fio/3383: > > #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 > > #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 > > CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > dump_stack+0x98/0xd5 > > ___might_sleep+0x175/0x260 > > __might_sleep+0x4a/0x80 > > _vm_unmap_aliases+0x45/0x250 > > vm_unmap_aliases+0x19/0x20 > > __set_memory_enc_dec+0xa4/0x130 > > set_memory_decrypted+0x10/0x20 > > dma_direct_alloc_pages+0x148/0x150 > > dma_direct_alloc+0xe/0x10 > > dma_alloc_attrs+0x86/0xc0 > > dma_pool_alloc+0x16f/0x2b0 > > nvme_queue_rq+0x878/0xc30 [nvme] > > __blk_mq_try_issue_directly+0x135/0x200 > > blk_mq_request_issue_directly+0x4f/0x80 > > blk_mq_try_issue_list_directly+0x46/0xb0 > > blk_mq_sched_insert_requests+0x19b/0x2b0 > > blk_mq_flush_plug_list+0x22f/0x3b0 > > blk_flush_plug_list+0xd1/0x100 > > blk_finish_plug+0x2c/0x40 > > iomap_dio_rw+0x427/0x490 > > ext4_file_write_iter+0x181/0x5d0 > > aio_write+0x109/0x1b0 > > io_submit_one+0x7d0/0xfa0 > > __x64_sys_io_submit+0xa2/0x280 > > do_syscall_64+0x5f/0x250 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > There is a series of patches in Christoph's dma-mapping.git repo in the > > for-next branch on track for 5.8: > > > > 1d659236fb43 dma-pool: scale the default DMA coherent pool size with memory capacity > > 82fef0ad811f x86/mm: unencrypted non-blocking DMA allocations use coherent pools > > 2edc5bb3c5cc dma-pool: add pool sizes to debugfs > > 76a19940bd62 dma-direct: atomic allocations must come from atomic coherent pools > > 54adadf9b085 dma-pool: dynamically expanding atomic pools > > c84dc6e68a1d dma-pool: add additional coherent pools to map to gfp mask > > e860c299ac0d dma-remap: separate DMA atomic pools from direct remap code > > > > We'd like to prepare backports to LTS kernels so that our guest images are > > not modified by us and don't exhibit this issue. > > > > They are bigger than we'd like: > > > > arch/x86/Kconfig | 1 + > > drivers/iommu/dma-iommu.c | 5 +- > > include/linux/dma-direct.h | 2 + > > include/linux/dma-mapping.h | 6 +- > > kernel/dma/Kconfig | 6 +- > > kernel/dma/Makefile | 1 + > > kernel/dma/direct.c | 56 ++++++-- > > kernel/dma/pool.c | 264 ++++++++++++++++++++++++++++++++++++ > > kernel/dma/remap.c | 121 +---------------- > > 9 files changed, 324 insertions(+), 138 deletions(-) > > create mode 100644 kernel/dma/pool.c > > > > But they apply relatively cleanly to more modern kernels like 5.4. We'd > > like to backport these all the way to 4.19, however, otherwise guests > > encounter these bugs. > > > > The changes to kernel/dma/remap.c, for example, simply moves code to the > > new pool.c. But that original code is actually in arch/arm64 in 4.19 and > > was moved in 5.0: > > > > commit 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450 > > Author: Christoph Hellwig <hch@xxxxxx> > > Date: Sun Nov 4 20:29:28 2018 +0100 > > > > dma-mapping: move the arm64 noncoherent alloc/free support to common code > > > > commit f0edfea8ef93ed6cc5f747c46c85c8e53e0798a0 > > Author: Christoph Hellwig <hch@xxxxxx> > > Date: Fri Aug 24 10:31:08 2018 +0200 > > > > dma-mapping: move the remap helpers to a separate file > > > > And there are most certainly more dependencies to get a cleanly applying > > series to 4.19.123. So the backports could be quite extensive. > > > > Peter Gonda <pgonda@xxxxxxxxxx> is currently handling these and we're > > looking for advice: should we compile a full list of required backports > > that would be needed to get a series that would only consist of minor > > conflicts or is this going to be a non-starter? > > A full series would be good. Once these hit Linus's tree and show up in > a -rc or two, feel free to send on the backports and we can look at them > then. > Thanks Greg. I'll let Peter follow-up when he has the full list of commits that we'll need with minimal conflicts to apply this series cleanly as a kind of preview of what to expect the last week of June for 4.19 LTS :)