On 3/3/25 8:41 AM, Carlos Sergio Bederian wrote: > On Mon, Mar 3, 2025 at 12:25 PM Dave Jiang <dave.jiang@xxxxxxxxx> wrote: >> >> >> >> On 3/1/25 8:17 AM, Carlos Sergio Bederian wrote: >>> On Fri, Feb 28, 2025 at 6:39 PM Dave Jiang <dave.jiang@xxxxxxxxx> wrote: >>>> >>>> >>>> >>>> On 2/28/25 2:23 PM, Carlos Sergio Bederian wrote: >>>>> I work at an HPC center and I've been trying to figure out why the >>>>> knem intra-node communication kernel module stopped being able to use >>>>> IOAT to offload memcpy at some point in time, presumably a long time >>>>> ago. >>>>> The knem module uses dma_find_channel(DMA_MEMCPY) to get a dma_chan so >>>>> I wrote a test kernel module that tries to grab a dma_chan using both >>>>> dma_find_channel and dma_request_channel and then submits a memcpy. >>>>> dma_request_channel succeeds in returning a DMA_MEMCPY channel, but >>>>> dma_find_channel never does, regardless of order. This is on a Debian >>>>> 6.12.9 kernel. >>>>> Is there anything I'm missing? >>>> >>>> Does dmatest work for you? >>> Yup, I've just compiled 6.12.17 with dmatest and it ran fine on every channel >>> listed in /sys/class/dma. No changes wrt dma_find_channel. >> >> If dmatest is working then there does not appear to be any kernel >> regressions AFAICT. You can either try to do a git bisect and figure out >> what changed for you, or you can do some code tracing with >> dma_find_channel() and see why your code is failing to locate a channel. >> You can also compare your code with dmatest and see if there is anything >> you may need to tweak for your code. > > AFAICT dmatest only calls dma_request_channel(), it doesn't cover > dma_find_channel(). I think what changed was setting the channels in ioat with DMA_PRIVATE, which made them unavailable with channel_table that dma_find_channel() is expecting. I suggest you switch your code to use dma_request_channel(). https://elixir.bootlin.com/linux/v6.14-rc4/source/drivers/dma/ioat/init.c#L1160 > >> >> DJ >> >>> >>>> Also, make sure dmatest isn't loaded when you have your module loaded. >>> dmatest wasn't even built. >>> >>>> Or any other kernel module that uses dma like ntb_transport isn't claiming >>>> the channels. >>> No users AFAICT. >>> >>>> >>>> DJ >>>>> >>>>> static struct dma_chan* dma_req(void) { >>>>> struct dma_chan* chan = NULL; >>>>> dma_cap_mask_t mask; >>>>> dma_cap_zero(mask); >>>>> dma_cap_set(DMA_MEMCPY, mask); >>>>> chan = dma_request_channel(mask, NULL, NULL); >>>>> if (!chan) { >>>>> pr_err("dmacopy: dma_request_channel didn't return a channel"); >>>>> } else { >>>>> pr_info("dmacopy: dma_request_channel succeeded"); >>>>> } >>>>> return chan; >>>>> } >>>>> >>>>> static struct dma_chan* dma_find(void) { >>>>> struct dma_chan* chan = NULL; >>>>> dmaengine_get(); >>>>> chan = dma_find_channel(DMA_MEMCPY); >>>>> if (!chan) { >>>>> pr_err("dmacopy: dma_find_channel didn't return a channel"); >>>>> dmaengine_put(); >>>>> } else { >>>>> pr_info("dmacopy: dma_find_channel succeeded"); >>>>> } >>>>> return chan; >>>>> } >>>>> >>>> >>