Dear Linux kernel team I am working on PCIe device driver and need to transfer up to 20MB data, unfortunately the device has no scatter/gather controller, so I have to do in usual way. in the code it done in the following way (shortlty): long ioctl_dma(struct file *filp, unsigned int *cmd_p, unsigned long *arg_p){ unsigned long arg; arg = *arg_p; #define SG_MAX_ORDER 10 max_order_length = (2<<(SG_MAX_ORDER-1))*PAGE_SIZE; ppWriteBuf = (void *)__get_free_pages(GFP_KERNEL , SG_MAX_ORDER); pTmpDmaHandle = pci_map_single(pdev, pWriteBuf, max_order_length, PCI_DMA_FROMDEVICE); 1. copy_from_user(&dma_data, (device_ioctrl_dma*)arg, (size_t)io_dma_size)) // get user buffer and data 2. tmp_dma_size = dma_data.dma_size; // get DMA size from user buffer 3. nr_entries = tmp_dma_size/max_order_length; // how many DMAs has to be done 4. for(int i=0; i < nr_entries; ++i){ Make DMA; pci_dma_sync_single_for_cpu(pdev, pTmpDmaHandle, max_order_length, PCI_DMA_FROMDEVICE); copy_to_user ((void *)(arg + tmp_user_offset), pWriteBuf, max_order_length) pci_dma_sync_single_for_device(pdev,pTmpDmaHandle, max_order_length, PCI_DMA_FROMDEVICE); tmp_user_offset += max_order_length; } } this work fine and gives in user application around 87ms for 20MB. Than I just do: get_user_pages( (unsigned long)arg, // start pDmaUnit->nr_pages, // length in pages 1, // >0 --> write to user space 0, // force. drivers should set 0 pDmaUnit->pages, NULL); The DMA time goes to ~50ms (was 87ms) regards Ludwig