On 2015/6/8 23:48, Vinod Koul wrote: > On Mon, Jun 08, 2015 at 07:44:43PM +0800, Jiang Liu wrote: >> On 2015/6/8 18:42, Vinod Koul wrote: >>> On Tue, Jun 02, 2015 at 02:37:31PM +0800, Jiang Liu wrote: >>>> Ccing Rafael, it's ACPI hotplug related. >>>> >>>> On 2015/6/2 14:36, Jiang Liu wrote: >>>>> The dmaengine core assumes that async DMA devices will only be removed >>>>> when they not used anymore, or it assumes dma_async_device_unregister() >>>>> will only be called by dma driver exit routines. But this assumption is >>>>> not true for the IOAT driver, which calls dma_async_device_unregister() >>>>> from ioat_remove(). So current IOAT driver doesn't support device >>>>> hot-removal because it may cause system crash to hot-remove an inuse >>>>> IOAT device. >>>>> >>>>> To support CPU socket hot-removal, all PCI devices, including IOAT >>>>> devices embedded in the socket, will be hot-removed. The idea solution >>>>> is to enhance the dmaengine core and IOAT driver to support hot-removal, >>>>> but that's too hard. >>>>> >>>>> This patch implements a hack to disable IOAT devices under hotplug-capable >>>>> CPU socket so it won't break socket hot-removal. >>>>> >>> So below looks okay though I wonder how hard would it be to fix hot unplug ? >> Hi Vinod, >> Thanks for review. About three years ago I worked out a >> patch set to enhance the dmaengine core and ioat device driver to >> support hot-removal. But it has been rejected due to concerns about >> performance penalty caused by usage tracking. >> To support hot-removal, we need to track dma channel usage >> and a way to reclaim dma channels when hot-removing. This may cause >> sensible performance penalty. Recently I have tried again but still >> haven't find a way to support hot-removal. So eventually I suggest >> to disable IOAT device on hot-plug capable systems. > > Or on a different mechanism, take the module reference on the channel > allocation and release it one channel release. > > That way we don't need to count and we ensure dmaengine module is removed > only when users have stopped using the device... Hi Vinod, The main trouble is caused by the fact that dmaengine use a global reference count dmaengine_ref_count. Once DMA clients increased the global reference count, they assume all DMA channels won't go away and directly get DMA channel from the channel_table[] table without increasing reference count on individual channel. If we try to enable per-channel reference count, it may cause performance penalty. Another issue is that a DMA channel could be used by any CPU, so we can't guarantee DMA channel is free even if we have stopped all PCI devices under the same PCI host bridge with the IOAT device. And there's no interface to reclaim channels from CPU or other DMA clients yet. So based on these factors, we suggest to disable IOAT devices on hot-pluggable socket. Thanks! Gerry -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html