On Mon, Jan 09, 2017 at 11:34:55PM +0300, Nikita Yushchenko wrote: > I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC > but in block layer, in particular it should be controlled by > blk_queue_bounce_limit(). [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it > is something completely different, namely it is for request merging for > hw not supporting scatter-gather]. And NVMe also uses block layer and > thus should get same support. NVMe shouldn't have to call blk_queue_bounce_limit - blk_queue_bounce_limit is to set the DMA addressing limit of the device. NVMe devices must support unlimited 64-bit addressing and thus calling blk_queue_bounce_limit from NVMe does not make sense. That being said currently the default for a queue without a call to blk_queue_make_request which does the wrong thing on highmem setups, so we should fix it. In fact BLK_BOUNCE_HIGH as-is doesn't really make much sense these days as no driver should ever dereference pages passed to it directly. > Maybe fixing that, together with making NVMe use this API, could stop it > from issuing dma_map()s of addresses beyond mask. NVMe should never bounce, the fact that it currently possibly does for highmem pages is a bug. > As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this > macro in the kernel do is - *disable* bounce buffers in block layer if > PCI_DMA_BUS_IS_PHYS is zero. That's not ironic but the whole point of the macro (horrible name and the fact that it should be a dma_ops setting aside). > - architecture should stop breaking 64-bit DMA when driver attempts to > set 64-bit dma mask, > > - NVMe should issue proper blk_queue_bounce_limit() call based on what > is actually set mask, Or even better remove the call to dma_set_mask_and_coherent with DMA_BIT_MASK(32). NVMe is designed around having proper 64-bit DMA addressing, there is not point in trying to pretent it works without that > - and blk_queue_bounce_limit() should also be fixed to actually set > 0xffffffff limit, instead of replacing it with (max_low_pfn << > PAGE_SHIFT) as it does now. We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to mix the highmem aspect with the addressing limits. In fact the whole block bouncing scheme doesn't make much sense at all these days, we should rely on swiotlb instead. > What I mean is some API to allocate memory for use with streaming DMA in > such way that bounce buffers won't be needed. There are many cases when > at buffer allocation time, it is already known that buffer will be used > for DMA with particular device. Bounce buffers will still be needed > cases when no such information is available at allocation time, or when > there is no directly-DMAable memory available at allocation time. For block I/O that is never the case.