Re: Infrastructure for zerocopy I/O

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 02, 2015 at 10:54:52AM -0500, Alan Stern wrote:
> Well, it's partly a matter of taste.  But there's also a race: This 
> code adds usbm to ps->memory_list (making it available to URB 
> submissions running on other CPUs) before usbm has been completely 
> initialized (vm_start isn't set yet).

OK. Joined the functions.

> find_memory_area() is used in only one place.  You can put everything 
> together if you move usbdev_mmap() and associated routines up near the 
> start of the file, after usbdev_read().

Done. I hope I got it the way you wanted.

> (Yes, I did.)  On further thought, testing uurb_start is sufficient.
> If uurb->buffer_length then turns out to be too big, the function
> should return an error (or rather, an ERR_PTR) and the URB submission
> should fail.

I don't understand “should” here. I read it as “you should change your
function for this”, but this doesn't rhyme with that uurb_start should be
sufficient. Do you want me to just remove the check? Is there something else
that would guard against it?

>> Done. (I saw you used a plural; is there more than this one?)
> Well, alloc_urb_memory() also can fail.

I can't find this function. What do you mean?

>> New patch attached. However, it's not working; I'm getting several of these
>> messages:
>> 
>>   [   28.796244] DMAR: Allocating domain for 2-2 failed
> I don't know what the reason is for that.  It may be that your kernel 
> isn't configured to allocate as much coherent memory as you are asking 
> for.  We'll have to investigate further, maybe ask somebody who knows 
> more about how the DMA subsystem works.

I found out why; I had messed up the transfer_dma address calculation,
subtracting the kernel memory address instead of the VM address. After that,
the patch works again.

>> +struct usb_memory {
>> +	struct list_head memlist;
>> +	int vma_use_count;
>> +	int usb_use_count;
> This name has disturbed me.  It seems like urb_use_count would be more
> descriptive ("urb" rather than "usb").

Changed.

> The GFP_DMA32 isn't necessary now.  dma_zalloc_coherent() is smart
> enough to allocate memory that is compatible with the device's DMA
> mask.  If the hardware can do 64-bit DMA then the buffer doesn't need
> to be located in the first 4 GB.

Done.

> By the way, the convention in this file is to indent continuation
> lines by two tab stops beyond the first line, not to align things with
> an open paren or anything else.  (Applies elsewhere too.)

Done. (That's an unusual convention.)

> I would allocate these two in the opposite order.  Just because the
> DMA routines involve more work than kzalloc/kfree.

Done. My thought, by the way, was that the DMA routines were much more likely
to fail.

>> +	usbm->mem = mem;
>> +	usbm->dma_handle = dma_handle;
>> +	usbm->size = size;
>> +	usbm->ps = ps;
>> +	spin_lock_irqsave(&ps->lock, flags);
>> +	list_add_tail(&usbm->memlist, &ps->memory_list);
>> +	spin_unlock_irqrestore(&ps->lock, flags);
> Like I mentioned earlier, it shouldn't be added to the list until it
> is ready to be used.

Done.

>>  static void free_async(struct async *as)
>>  {
>> +	struct usb_memory *usbm = NULL;
> Not needed any more.

Done.

>> +		if (find_memory_area(ps, uurb) == NULL) {
> Don't call find_memory_area() twice!  Save this result and use it
> later.  Also, move this call up before the "switch" statement so that
> it will apply to all transfer types, not just bulk.
> 
> Suitable adjustments will be needed in the rest of this routine, but
> nothing that requires additional review comments.

This part I don't understand. You want to populate usbm (by calling
find_memory_area()) unconditionally, also for control transfers? I can't see
offhand another way to call it only once during the function, save for a bool
that says if we called it or not.

New patch attached. Note that I haven't had the chance to retest after some
of the latest changes.

/* Steinar */
-- 
Homepage: https://www.sesse.net/
>From d03fc262072183887825f889996244e22ff74438 Mon Sep 17 00:00:00 2001
From: "Steinar H. Gunderson" <sesse@xxxxxxxxxx>
Date: Thu, 26 Nov 2015 01:19:13 +0100
Subject: [PATCH] Add support for usbfs zerocopy.

Add a new interface for userspace to preallocate memory that can be
used with usbfs. This gives two primary benefits:

 - Zerocopy; data no longer needs to be copied between the userspace
   and the kernel, but can instead be read directly by the driver from
   userspace's buffers. This works for both bulk and isochronous transfers;
   isochronous also no longer need to memset() the buffer to zero to avoid
   leaking kernel data.

 - Once the buffers are allocated, USB transfers can no longer fail due to
   memory fragmentation; previously, long-running programs could run into
   problems finding a large enough contiguous memory chunk, especially on
   embedded systems or at high rates.

Memory is allocated by using mmap() against the usbfs file descriptor,
and similarly deallocated by munmap(). Once memory has been allocated,
using it as pointers to a bulk or isochronous operation means you will
automatically get zerocopy behavior. Note that this also means you cannot
modify outgoing data until the transfer is complete. The same holds for
data on the same cache lines as incoming data; DMA modifying them at the
same time could lead to your changes being overwritten.

Largely based on a patch by Markus Rechberger with some updates. The original
patch can be found at:

  http://sundtek.de/support/devio_mmap_v0.4.diff

Signed-off-by: Steinar H. Gunderson <sesse@xxxxxxxxxx>
Signed-off-by: Markus Rechberger <mrechberger@xxxxxxxxx>
---
 drivers/usb/core/devio.c | 201 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 191 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 38ae877c..e31bae1 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -50,6 +50,7 @@
 #include <linux/user_namespace.h>
 #include <linux/scatterlist.h>
 #include <linux/uaccess.h>
+#include <linux/dma-mapping.h>
 #include <asm/byteorder.h>
 #include <linux/moduleparam.h>
 
@@ -69,6 +70,7 @@ struct usb_dev_state {
 	spinlock_t lock;            /* protects the async urb lists */
 	struct list_head async_pending;
 	struct list_head async_completed;
+	struct list_head memory_list;
 	wait_queue_head_t wait;     /* wake up if a request completed */
 	unsigned int discsignr;
 	struct pid *disc_pid;
@@ -79,6 +81,17 @@ struct usb_dev_state {
 	u32 disabled_bulk_eps;
 };
 
+struct usb_memory {
+	struct list_head memlist;
+	int vma_use_count;
+	int urb_use_count;
+	u32 size;
+	void *mem;
+	dma_addr_t dma_handle;
+	unsigned long vm_start;
+	struct usb_dev_state *ps;
+};
+
 struct async {
 	struct list_head asynclist;
 	struct usb_dev_state *ps;
@@ -89,6 +102,7 @@ struct async {
 	void __user *userbuffer;
 	void __user *userurb;
 	struct urb *urb;
+	struct usb_memory *usbm;
 	unsigned int mem_usage;
 	int status;
 	u32 secid;
@@ -157,6 +171,107 @@ static int connected(struct usb_dev_state *ps)
 			ps->dev->state != USB_STATE_NOTATTACHED);
 }
 
+static void dec_usb_memory_use_count(struct usb_memory *usbm, int *count)
+{
+	struct usb_dev_state *ps = usbm->ps;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ps->lock, flags);
+	if (count) {
+		--*count;
+	}
+	if (usbm->urb_use_count == 0 && usbm->vma_use_count == 0) {
+		list_del_init(&usbm->memlist);
+		spin_unlock_irqrestore(&ps->lock, flags);
+
+		usb_free_coherent(ps->dev, usbm->size, usbm->mem,
+				usbm->dma_handle);
+		usbfs_decrease_memory_usage(
+			usbm->size + sizeof(struct usb_memory));
+		kfree(usbm);
+	} else {
+		spin_unlock_irqrestore(&ps->lock, flags);
+	}
+}
+
+static void usbdev_vm_open(struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = vma->vm_private_data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&usbm->ps->lock, flags);
+	++usbm->vma_use_count;
+	spin_unlock_irqrestore(&usbm->ps->lock, flags);
+}
+
+static void usbdev_vm_close(struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = vma->vm_private_data;
+
+	dec_usb_memory_use_count(usbm, &usbm->vma_use_count);
+}
+
+struct vm_operations_struct usbdev_vm_ops = {
+	.open = usbdev_vm_open,
+	.close = usbdev_vm_close
+};
+
+static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct usb_memory *usbm = NULL;
+	struct usb_dev_state *ps = file->private_data;
+	size_t size = vma->vm_end - vma->vm_start;
+	void *mem;
+	unsigned long flags;
+	dma_addr_t dma_handle;
+	int ret;
+
+	usbm = kzalloc(sizeof(struct usb_memory), GFP_KERNEL);
+	if (!usbm)
+		return -ENOMEM;
+
+	ret = usbfs_increase_memory_usage(size + sizeof(struct usb_memory));
+	if (ret) {
+		kfree(usbm);
+		return ret;
+	}
+
+	mem = usb_alloc_coherent(ps->dev, size, GFP_KERNEL, &dma_handle);
+	if (!mem) {
+		usbfs_decrease_memory_usage(
+			usbm->size + sizeof(struct usb_memory));
+		kfree(usbm);
+		return -ENOMEM;
+	}
+
+	memset(mem, 0, size);
+
+	usbm->mem = mem;
+	usbm->dma_handle = dma_handle;
+	usbm->size = size;
+	usbm->ps = ps;
+
+	if (remap_pfn_range(vma, vma->vm_start,
+			virt_to_phys(usbm->mem) >> PAGE_SHIFT,
+			size, vma->vm_page_prot) < 0) {
+		dec_usb_memory_use_count(usbm, NULL);
+		return -EAGAIN;
+	}
+
+	usbm->vm_start = vma->vm_start;
+	usbm->vma_use_count = 1;
+	vma->vm_flags |= VM_IO;
+	vma->vm_flags |= (VM_DONTEXPAND | VM_DONTDUMP);
+	vma->vm_ops = &usbdev_vm_ops;
+	vma->vm_private_data = usbm;
+
+	spin_lock_irqsave(&ps->lock, flags);
+	list_add_tail(&usbm->memlist, &ps->memory_list);
+	spin_unlock_irqrestore(&ps->lock, flags);
+
+	return 0;
+}
+
 static loff_t usbdev_lseek(struct file *file, loff_t offset, int orig)
 {
 	loff_t ret;
@@ -297,8 +412,13 @@ static void free_async(struct async *as)
 		if (sg_page(&as->urb->sg[i]))
 			kfree(sg_virt(&as->urb->sg[i]));
 	}
+
 	kfree(as->urb->sg);
-	kfree(as->urb->transfer_buffer);
+	if (as->usbm == NULL)
+		kfree(as->urb->transfer_buffer);
+	else
+		dec_usb_memory_use_count(as->usbm, &as->usbm->urb_use_count);
+
 	kfree(as->urb->setup_packet);
 	usb_free_urb(as->urb);
 	usbfs_decrease_memory_usage(as->mem_usage);
@@ -910,6 +1030,7 @@ static int usbdev_open(struct inode *inode, struct file *file)
 	INIT_LIST_HEAD(&ps->list);
 	INIT_LIST_HEAD(&ps->async_pending);
 	INIT_LIST_HEAD(&ps->async_completed);
+	INIT_LIST_HEAD(&ps->memory_list);
 	init_waitqueue_head(&ps->wait);
 	ps->discsignr = 0;
 	ps->disc_pid = get_pid(task_pid(current));
@@ -962,6 +1083,13 @@ static int usbdev_release(struct inode *inode, struct file *file)
 		free_async(as);
 		as = async_getcompleted(ps);
 	}
+
+	/* Any elements still left on this list are still in use and cannot
+	 * be deleted here, but will be freed once the counters go to 0
+	 * (see dec_usb_memory_use_count()).
+	 */
+	list_del(&ps->memory_list);
+
 	kfree(ps);
 	return 0;
 }
@@ -1283,6 +1411,28 @@ static int proc_setconfig(struct usb_dev_state *ps, void __user *arg)
 	return status;
 }
 
+static struct usb_memory *
+find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
+{
+	struct usb_memory *usbm = NULL, *iter;
+	unsigned long flags;
+	unsigned long uurb_start = (unsigned long)uurb->buffer;
+
+	spin_lock_irqsave(&ps->lock, flags);
+	list_for_each_entry(iter, &ps->memory_list, memlist) {
+		if (uurb_start >= iter->vm_start &&
+		    uurb_start < iter->vm_start + iter->size &&
+		    uurb->buffer_length <= iter->vm_start + iter->size -
+			uurb_start) {
+			usbm = iter;
+			usbm->urb_use_count++;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&ps->lock, flags);
+	return usbm;
+}
+
 static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
 			struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
 			void __user *arg)
@@ -1291,6 +1441,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 	struct usb_host_endpoint *ep;
 	struct async *as = NULL;
 	struct usb_ctrlrequest *dr = NULL;
+	struct usb_memory *usbm = NULL;
 	unsigned int u, totlen, isofrmlen;
 	int i, ret, is_in, num_sgs = 0, ifnum = -1;
 	int number_of_packets = 0;
@@ -1372,9 +1523,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 			uurb->type = USBDEVFS_URB_TYPE_INTERRUPT;
 			goto interrupt_urb;
 		}
-		num_sgs = DIV_ROUND_UP(uurb->buffer_length, USB_SG_SIZE);
-		if (num_sgs == 1 || num_sgs > ps->dev->bus->sg_tablesize)
-			num_sgs = 0;
+		/* do not use SG buffers when memory mapped segments
+		 * are in use
+		 */
+		if (find_memory_area(ps, uurb) == NULL) {
+			num_sgs = DIV_ROUND_UP(uurb->buffer_length,
+					USB_SG_SIZE);
+			if (num_sgs == 1 ||
+			    num_sgs > ps->dev->bus->sg_tablesize) {
+				num_sgs = 0;
+			}
+		}
 		if (ep->streams)
 			stream_id = uurb->stream_id;
 		break;
@@ -1445,10 +1604,11 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 	if (ret)
 		goto error;
 	as->mem_usage = u;
+	as->usbm = NULL;
 
 	if (num_sgs) {
 		as->urb->sg = kmalloc(num_sgs * sizeof(struct scatterlist),
-				      GFP_KERNEL);
+				GFP_KERNEL);
 		if (!as->urb->sg) {
 			ret = -ENOMEM;
 			goto error;
@@ -1476,21 +1636,33 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 			totlen -= u;
 		}
 	} else if (uurb->buffer_length > 0) {
-		as->urb->transfer_buffer = kmalloc(uurb->buffer_length,
+		if (!list_empty(&ps->memory_list)) {
+			usbm = find_memory_area(ps, uurb);
+			if (usbm) {
+				as->usbm = usbm;
+				as->urb->transfer_buffer = usbm->mem;
+			} else {
+				ret = -ENOMEM;
+				goto error;
+			}
+		} else {
+			as->urb->transfer_buffer = kmalloc(uurb->buffer_length,
 				GFP_KERNEL);
+		}
 		if (!as->urb->transfer_buffer) {
 			ret = -ENOMEM;
 			goto error;
 		}
 
-		if (!is_in) {
+		if (!is_in && usbm == NULL) {
 			if (copy_from_user(as->urb->transfer_buffer,
 					   uurb->buffer,
 					   uurb->buffer_length)) {
 				ret = -EFAULT;
 				goto error;
 			}
-		} else if (uurb->type == USBDEVFS_URB_TYPE_ISO) {
+		} else if (uurb->type == USBDEVFS_URB_TYPE_ISO &&
+				usbm == NULL) {
 			/*
 			 * Isochronous input data may end up being
 			 * discontiguous if some of the packets are short.
@@ -1545,10 +1717,16 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 	isopkt = NULL;
 	as->ps = ps;
 	as->userurb = arg;
-	if (is_in && uurb->buffer_length > 0)
+	if (is_in && uurb->buffer_length > 0 && usbm == NULL) {
 		as->userbuffer = uurb->buffer;
-	else
+	} else {
 		as->userbuffer = NULL;
+	}
+	if (usbm != NULL) {
+		as->urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
+		as->urb->transfer_dma = usbm->dma_handle +
+				((unsigned long)uurb->buffer - usbm->vm_start);
+	}
 	as->signr = uurb->signr;
 	as->ifnum = ifnum;
 	as->pid = get_pid(task_pid(current));
@@ -1604,6 +1782,8 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
 	return 0;
 
  error:
+	if (usbm)
+		dec_usb_memory_use_count(usbm, &usbm->urb_use_count);
 	kfree(isopkt);
 	kfree(dr);
 	if (as)
@@ -2373,6 +2553,7 @@ const struct file_operations usbdev_file_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =   usbdev_compat_ioctl,
 #endif
+	.mmap =           usbdev_mmap,
 	.open =		  usbdev_open,
 	.release =	  usbdev_release,
 };
-- 
2.1.4


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux