Re: Infrastructure for zerocopy I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 Dec 2015, Steinar H. Gunderson wrote:

> >> +	usbm->mem = mem;
> >> +	usbm->offset = virt_to_phys(mem);
> >> +	usbm->size = size;
> >> +	usbm->ps = ps;
> >> +	spin_lock_irqsave(&ps->lock, flags);
> >> +	list_add_tail(&usbm->memlist, &ps->memory_list);
> >> +	spin_unlock_irqrestore(&ps->lock, flags);
> >> +
> >> +	return usbm;
> >> +}
> > This subroutine should be merged into usbdev_mmap().
> 
> Hm. Can you elaborate a bit on why? I thought it was nice to have it as a
> counterpoint to dec_use_count(), the deallocation function, and reasonably
> short.

Well, it's partly a matter of taste.  But there's also a race: This 
code adds usbm to ps->memory_list (making it available to URB 
submissions running on other CPUs) before usbm has been completely 
initialized (vm_start isn't set yet).

> > In fact, all the memory-oriented routines should be located together
> > in the source file.  It's confusing to put some of them near the start
> > and others near the end.
> 
> Is this about usbdev_mmap, or about usbdev_vm_{open,close}? I looked at the
> former as a syscall function and thus somehow better suited closer to e.g.
> the ioctl function. find_memory_area() can't be moved too far down since it's
> used from other places, unless you want me to do a forward declaration?

find_memory_area() is used in only one place.  You can put everything 
together if you move usbdev_mmap() and associated routines up near the 
start of the file, after usbdev_read().

> >> +		    uurb_start >= iter->vm_start &&
> >> +		    uurb->buffer_length <= iter->vm_start - uurb_start +
> >> +				(PAGE_SIZE << get_order(iter->size))) {
> > Shouldn't this be:
> > 
> > 			uurb->start >= iter->vm_start &&
> > 			uurb->start < iter->vm_start + iter->size &&
> > 			uurb->buffer_length <= iter->vm_start + iter->size -
> > 					uurb->start) {
> 
> I think both will work (modulo maybe issues with the PAGE_SIZE part?),
> but yours is clearer. Changed. (I assume you meant uurb_start, not
> uurb->start.)

(Yes, I did.)  On further thought, testing uurb_start is sufficient.
If uurb->buffer_length then turns out to be too big, the function
should return an error (or rather, an ERR_PTR) and the URB submission
should fail.

> >>  	as->userurb = arg;
> >> -	if (is_in && uurb->buffer_length > 0)
> >> +	if (is_in && uurb->buffer_length > 0 && usbm == NULL)
> > I think you should keep this even when usbm is not NULL.  No?
> 
> No, then processcompl() will try to copy data there, which defeats the point
> of zerocopy. What other use would the userbuffer member have?

My mistake; for some reason I was thinking of as->urb->transfer_buffer 
when I read as->userbuffer.

> >>  		as->userbuffer = uurb->buffer;
> >>  	else
> >>  		as->userbuffer = NULL;
> > When you use dma_zalloc_coherent(), you have to tell the USB core that
> > the buffer is already mapped for DMA by setting the URB_NO_TRANSFER_DMA_MAP
> > flag in urb->transfer_flags and by storing the DMA address in
> > urb->transfer_dma.
> 
> Done. I have one question, though: Should I offset the transfer_dma address
> to match with the offset in the buffer, or is this just a handle? (I've
> assumed the latter, but I'm not at all sure it's correct.)

No, you do have to offset transfer_dma_address to match.

> >> +	if (remap_pfn_range(vma, vma->vm_start,
> >> +			    virt_to_phys(usbm->mem) >> PAGE_SHIFT,
> >> +			    size, vma->vm_page_prot) < 0)
> >> +		return -EAGAIN;
> > As Oliver pointed out, the memory needs to be reclaimed and accounted
> > for on the failure pathways.
> 
> Done. (I saw you used a plural; is there more than this one?)

Well, alloc_urb_memory() also can fail.

> New patch attached. However, it's not working; I'm getting several of these
> messages:
> 
>   [   28.796244] DMAR: Allocating domain for 2-2 failed

I don't know what the reason is for that.  It may be that your kernel 
isn't configured to allocate as much coherent memory as you are asking 
for.  We'll have to investigate further, maybe ask somebody who knows 
more about how the DMA subsystem works.

---------------

> This is essentially a patch by Markus Rechberger with some updates.
> The original can be found at
> 
>   http://sundtek.de/support/devio_mmap_v0.4.diff
> 
> This version has the following changes:
> 
>   - Rebased against a newer kernel (with some conflicts fixed).
>   - Fixed all checkpatch violations and some style issues.
>   - Fixes an issue where isochronous transfers would not really be
>     zero-copy, but go through a pointless memcpy from one area to
>     itself.
>   - Ask for cached memory instead of uncached.

Actually, with dma_zalloc_coherent() you really are asking for
uncached memory (on systems where it makes a difference).

>   - Drop the custom ioctl; use mmap only for allocation.
> 
> Signed-off-by: Steinar H. Gunderson <sesse@xxxxxxxxxx>
> Signed-off-by: Markus Rechberger <mrechberger@xxxxxxxxx>
> ---
>  drivers/usb/core/devio.c | 202 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 192 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
> index 38ae877c..95c9fed 100644
> --- a/drivers/usb/core/devio.c
> +++ b/drivers/usb/core/devio.c
> @@ -50,6 +50,7 @@
>  #include <linux/user_namespace.h>
>  #include <linux/scatterlist.h>
>  #include <linux/uaccess.h>
> +#include <linux/dma-mapping.h>
>  #include <asm/byteorder.h>
>  #include <linux/moduleparam.h>
>  
> @@ -69,6 +70,7 @@ struct usb_dev_state {
>  	spinlock_t lock;            /* protects the async urb lists */
>  	struct list_head async_pending;
>  	struct list_head async_completed;
> +	struct list_head memory_list;
>  	wait_queue_head_t wait;     /* wake up if a request completed */
>  	unsigned int discsignr;
>  	struct pid *disc_pid;
> @@ -79,6 +81,17 @@ struct usb_dev_state {
>  	u32 disabled_bulk_eps;
>  };
>  
> +struct usb_memory {
> +	struct list_head memlist;
> +	int vma_use_count;
> +	int usb_use_count;

This name has disturbed me.  It seems like urb_use_count would be more
descriptive ("urb" rather than "usb").

> +	u32 size;
> +	void *mem;
> +	dma_addr_t dma_handle;
> +	unsigned long vm_start;
> +	struct usb_dev_state *ps;
> +};
> +
>  struct async {
>  	struct list_head asynclist;
>  	struct usb_dev_state *ps;
> @@ -89,6 +102,7 @@ struct async {
>  	void __user *userbuffer;
>  	void __user *userurb;
>  	struct urb *urb;
> +	struct usb_memory *usbm;
>  	unsigned int mem_usage;
>  	int status;
>  	u32 secid;
> @@ -157,6 +171,75 @@ static int connected(struct usb_dev_state *ps)
>  			ps->dev->state != USB_STATE_NOTATTACHED);
>  }
>  
> +static struct usb_memory *
> +alloc_usb_memory(struct usb_dev_state *ps, size_t size)
> +{
> +	struct usb_memory *usbm;
> +	void *mem;
> +	unsigned long flags;
> +	dma_addr_t dma_handle;
> +
> +	mem = dma_zalloc_coherent(&ps->dev->dev, size, &dma_handle,
> +				  GFP_KERNEL | GFP_DMA32);

The GFP_DMA32 isn't necessary now.  dma_zalloc_coherent() is smart
enough to allocate memory that is compatible with the device's DMA
mask.  If the hardware can do 64-bit DMA then the buffer doesn't need
to be located in the first 4 GB.

By the way, the convention in this file is to indent continuation
lines by two tab stops beyond the first line, not to align things with
an open paren or anything else.  (Applies elsewhere too.)

> +	if (!mem)
> +		return NULL;
> +
> +	usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL);
> +	if (!usbm) {
> +		dma_free_coherent(&ps->dev->dev, size, mem, dma_handle);
> +		return NULL;
> +	}

I would allocate these two in the opposite order.  Just because the
DMA routines involve more work than kzalloc/kfree.

> +	usbm->mem = mem;
> +	usbm->dma_handle = dma_handle;
> +	usbm->size = size;
> +	usbm->ps = ps;
> +	spin_lock_irqsave(&ps->lock, flags);
> +	list_add_tail(&usbm->memlist, &ps->memory_list);
> +	spin_unlock_irqrestore(&ps->lock, flags);

Like I mentioned earlier, it shouldn't be added to the list until it
is ready to be used.

> +
> +	return usbm;
> +}
> +
> +static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
> +{
> +	struct usb_dev_state *ps = usbm->ps;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ps->lock, flags);
> +	--*count;
> +	if (usbm->usb_use_count == 0 && usbm->vma_use_count == 0) {
> +		list_del_init(&usbm->memlist);
> +		dma_free_coherent(&ps->dev->dev, usbm->size, usbm->mem,
> +				  usbm->dma_handle);
> +		usbfs_decrease_memory_usage(
> +			usbm->size + sizeof(struct usb_memory));
> +		kfree(usbm);
> +	}
> +	spin_unlock_irqrestore(&ps->lock, flags);
> +}
> +
> +static struct usb_memory *
> +find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
> +{
> +	struct usb_memory *usbm = NULL, *iter;
> +	unsigned long flags;
> +	unsigned long uurb_start = (unsigned long)uurb->buffer;
> +
> +	spin_lock_irqsave(&ps->lock, flags);
> +	list_for_each_entry(iter, &ps->memory_list, memlist) {
> +		if (uurb_start >= iter->vm_start &&
> +		    uurb_start < iter->vm_start + iter->size &&
> +		    uurb->buffer_length <= iter->vm_start + iter->size -
> +			uurb_start) {
> +			usbm = iter;
> +			usbm->usb_use_count++;
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&ps->lock, flags);
> +	return usbm;
> +}
> +
>  static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig)
>  {
>  	loff_t ret;
> @@ -288,6 +371,7 @@ static struct async *alloc_async(unsigned int numisoframes)
>  
>  static void free_async(struct async *as)
>  {
> +	struct usb_memory *usbm = NULL;

Not needed any more.

>  	int i;
>  
>  	put_pid(as->pid);
> @@ -297,8 +381,13 @@ static void free_async(struct async *as)
>  		if (sg_page(&as->urb->sg[i]))
>  			kfree(sg_virt(&as->urb->sg[i]));
>  	}
> +
>  	kfree(as->urb->sg);
> -	kfree(as->urb->transfer_buffer);
> +	if (as->usbm == NULL)
> +		kfree(as->urb->transfer_buffer);
> +	else
> +		dec_usb_memory_use_count(usbm, &as->usbm->usb_use_count);
> +
>  	kfree(as->urb->setup_packet);
>  	usb_free_urb(as->urb);
>  	usbfs_decrease_memory_usage(as->mem_usage);
> @@ -910,6 +999,7 @@ static int usbdev_open(struct inode *inode, struct file *file)
>  	INIT_LIST_HEAD(&ps->list);
>  	INIT_LIST_HEAD(&ps->async_pending);
>  	INIT_LIST_HEAD(&ps->async_completed);
> +	INIT_LIST_HEAD(&ps->memory_list);
>  	init_waitqueue_head(&ps->wait);
>  	ps->discsignr = 0;
>  	ps->disc_pid = get_pid(task_pid(current));
> @@ -962,6 +1052,13 @@ static int usbdev_release(struct inode *inode, struct file *file)
>  		free_async(as);
>  		as = async_getcompleted(ps);
>  	}
> +
> +	/* Any elements still left on this list are still in use and cannot
> +	 * be deleted here, but will be freed once the counters go to 0
> +	 * (see dec_usb_memory_use_count()).
> +	 */
> +	list_del(&ps->memory_list);
> +
>  	kfree(ps);
>  	return 0;
>  }
> @@ -1291,6 +1388,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
>  	struct usb_host_endpoint *ep;
>  	struct async *as = NULL;
>  	struct usb_ctrlrequest *dr = NULL;
> +	struct usb_memory *usbm = NULL;
>  	unsigned int u, totlen, isofrmlen;
>  	int i, ret, is_in, num_sgs = 0, ifnum = -1;
>  	int number_of_packets = 0;
> @@ -1372,9 +1470,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
>  			uurb->type = USBDEVFS_URB_TYPE_INTERRUPT;
>  			goto interrupt_urb;
>  		}
> -		num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE);
> -		if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize)
> -			num_sgs = 0;
> +		/* do not use SG buffers when memory mapped segments
> +		 * are in use
> +		 */
> +		if (find_memory_area(ps, uurb) == NULL) {

Don't call find_memory_area() twice!  Save this result and use it
later.  Also, move this call up before the "switch" statement so that
it will apply to all transfer types, not just bulk.

Suitable adjustments will be needed in the rest of this routine, but
nothing that requires additional review comments.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux