Re: [PATCH RFC] IB: Kernel Data Bridge (kdbr) module for Inter-VM data transfer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/27/2017 10:09 PM, Parav Pandit wrote:
Hi Marcel Apfelbaum,

It is not clear from RFC why there is need for kernel component to transfer data from VM To VM through character driver.
Can this be achieved with just user space tool?


Hi Parav,
Thank you for your question.

In our design, the receiver allocates (User level Guest allocation) the buffer
and the sender (User level Guest in the same host) doesn't "send" the buffer,
just the guest virtual address and the length. We need the kernel part in order
to copy data between different user space QEMU processes securely and efficiently.
A user space implementation of such copy may not achieve those goal.

That being said, one implementation does not exclude the other,
so in the future we can support both ways.

As you can see in the RFC, Roadmap section, we intend to separate
the API to a library especially for this purpose.

Thanks,
Marcel


Parav

-----Original Message-----
From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
owner@xxxxxxxxxxxxxxx] On Behalf Of Marcel Apfelbaum
Sent: Monday, March 27, 2017 12:37 PM
To: linux-rdma@xxxxxxxxxxxxxxx
Cc: marcel@xxxxxxxxxx; yuval.shaia@xxxxxxxxxx
Subject: [PATCH RFC] IB: Kernel Data Bridge (kdbr) module for Inter-VM data
transfer

Objective
=========
Implement a transparent, fast, and secure data transfer mechanism
between multiple VMs in the same host.

General description
===================
This is a very early RFC for a new kernel module that facilitates direct VM to
VM data transfer. The module is developed with RDMA concept in mind and
is used by QEMU's pvrdma device implementation which is also in RFC stage.

However, the module is not limited to RDMA/pvrdma device and can be
used for general inter-process data transfer when a large amount of data is
necessary to be exchanged.

Down the road we plan to make it possible to be able to support inter-
machine communication by utilizing physical ROCE devices.

Note the driver performs only one copy operation from sender buffer
directly to the receive buffer.

RFC status
===========
The project is in early development stages and misses a good locking
mechanism, performant data structures, error flows and such.

We present it so we can get feedbacks on design, feature demands and to
receive comments from the community pointing us to the "right" direction.

What *does* work:
- The kdbr_test  (not yet user friendly)
  Steps:
      1. receiver -> run ./kdbr_test -g 1 -q 1 -p 2 -r 2
      2. sender -> run ./kdbr_test -s -g 2 -q 2 -p 1 -r 1
      3. sender -> send message (hit a key)
      4. receiver -> verify message (hit a key)
      5. close both clients
- Using QEMU's pvrdma device:
     - See the QEMU device on github:
https://github.com/yuvalshaia/qemu/tree/pvrdma.master
     - The guests need latest upstream kernel in order to have the pvrdma
driver.
     - Test with https://github.com/yuvalshaia/kibpingpong .
     - We will send a link to QEMU RFC shortly

Design
======
The following is a brief description of the major components in KDBR driver.

Queue
    An object that holds a list of buffer addresses to be used by KDBR
    when a remote peer transfers data. A process can use multiple
    queues to differentiate receive messages.

Connection
    An object that holds peer to peer connectivity information.
    - Reference to a queue for receiver-side process.
    - Peer info
    - Acknowledge type (datagram or connected mode).

Port
    An end point with unique ID.
    A process can have more then one port.
    Each port maintains a list of connections.

Request
    An object describing a send/receiver request.
    - Message type
    - A vector of buffer addresses to be sent/received.
    - Connection info
    - Request ID (used by KDBR to pass completion event to process).

Completion
   An object passed by KDBR to process to indicate a completion of an event.
   The KDBR notifies completion to both sender and receiver.
   (This object carries connection ID and request ID so the caller can
    locate an internal context and status of the operation.)

API
    The KDBR driver exposes a char device interface to user space.
    Steps to be executed by an user process to use the driver:
    1. Open the device file.
       This file descriptor will be used to register port(s).
    2. Register port(s).
       This operation will attach an user level process to a KDBR port.
       A port is identified by 64-bit ID and 64-bit network ID.
    3. Open the port's device file.
       The returned file descriptor will be used to connect and transmit data.
    4. Connect to peer.
    5. Receiver-side allocates buffer(s) and register their addresses with KDBR.
    6. Sender-side can now send buffer(s) to the peer..

Roadmap (out of order)
======================
- Utilize RoCE driver in order to support peers on a different host.
- Separate API to a library so the implementation can be changed in the
future.
- Considerate implementing zero copy by playing with page tables.

Any ideas, comments or suggestions would be highly appreciated.

Thanks,
Marcel Apfelbaum & Yuval Shaia

Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
Signed-off-by: Marcel Apfelbaum <marcel@xxxxxxxxxx>
---
 Kbuild      |   5 +
 Makefile    |  20 ++
 kdbr.h      | 104 ++++++++
 kdbr_main.c | 857
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++
 kdbr_test.c | 193 ++++++++++++++
 5 files changed, 1179 insertions(+)
 create mode 100644 Kbuild
 create mode 100644 Makefile
 create mode 100644 kdbr.h
 create mode 100644 kdbr_main.c
 create mode 100644 kdbr_test.c

diff --git a/Kbuild b/Kbuild
new file mode 100644
index 0000000..692e5c7
--- /dev/null
+++ b/Kbuild
@@ -0,0 +1,5 @@
+#ccflags-y := -I$(src)/include
+
+obj-m := kdbr.o
+
+kdbr-y := kdbr_main.o
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..8170077
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,20 @@
+KVERSION ?= $(shell uname -r)
+KDIR ?= /lib/modules/${KVERSION}/build
+
+modules:
+	$(MAKE) -C $(KDIR) M=$(PWD) modules
+
+modules_install:
+	$(MAKE) -C $(KDIR) M=$(PWD) modules_install
+
+clean:
+	$(MAKE) -C $(KDIR) M=$(PWD) clean
+	rm -f kdbr_test
+
+test:
+	gcc kdbr_test.c -o kdbr_test
+
+clean_test:
+	rm -f kdbr_test
+
+.PHONY: modules modules_install clean clean_test
diff --git a/kdbr.h b/kdbr.h
new file mode 100644
index 0000000..6c8e834
--- /dev/null
+++ b/kdbr.h
@@ -0,0 +1,104 @@
+/*
+ * Kernel Data Bridge driver - API
+ *
+ * Copyright 2016 Red Hat, Inc.
+ * Copyright 2016 Oracle
+ *
+ * Authors:
+ *   Marcel Apfelbaum <marcel@xxxxxxxxxx>
+ *   Yuval Shaia <yuval.shaia@xxxxxxxxxx>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef _KDBR_H
+#define _KDBR_H
+
+#ifdef __KERNEL__
+#include <linux/uio.h>
+#define KDBR_MAX_IOVEC_LEN	UIO_FASTIOV
+#else
+#include <sys/uio.h>
+#define KDBR_MAX_IOVEC_LEN	8
+#endif
+
+#define KDBR_FILE_NAME "/dev/kdbr"
+#define KDBR_MAX_PORTS 255
+
+#define KDBR_IOC_MAGIC 0xBA
+
+#define KDBR_REGISTER_PORT	_IOWR(KDBR_IOC_MAGIC, 0, struct
kdbr_reg)
+#define KDBR_UNREGISTER_PORT	_IOW(KDBR_IOC_MAGIC, 1, int)
+#define KDBR_IOC_MAX		2
+
+
+enum kdbr_ack_type {
+	KDBR_ACK_IMMEDIATE,
+	KDBR_ACK_DELAYED,
+};
+
+struct kdbr_gid {
+	unsigned long net_id;
+	unsigned long id;
+};
+
+struct kdbr_peer {
+	struct kdbr_gid rgid;
+	unsigned long rqueue;
+};
+
+struct list_head;
+struct mutex;
+struct kdbr_connection {
+	unsigned long queue_id;
+	struct kdbr_peer peer;
+	enum kdbr_ack_type ack_type;
+	/* TODO: hide the below fields in the .c file */
+	struct list_head *sg_vecs_list;
+	struct mutex *sg_vecs_mutex;
+};
+
+struct kdbr_reg {
+	struct kdbr_gid gid; /* in */
+	int port; /* out */
+};
+
+#define KDBR_REQ_SIGNATURE	0x000000AB
+#define KDBR_REQ_POST_RECV	0x00000100
+#define KDBR_REQ_POST_SEND	0x00000200
+#define KDBR_REQ_POST_MREG	0x00000300
+#define KDBR_REQ_POST_RDMA	0x00000400
+
+struct kdbr_req {
+	unsigned int flags; /* 8 bits signature, 8 bits msg_type */
+	struct iovec vec[KDBR_MAX_IOVEC_LEN];
+	int vlen; /* <= KDBR_MAX_IOVEC_LEN */
+	int connection_id;
+	struct kdbr_peer peer;
+	unsigned long req_id;
+};
+
+#define KDBR_ERR_CODE_EMPTY_VEC		0x101
+#define KDBR_ERR_CODE_NO_MORE_RECV_BUF	0x102
+#define KDBR_ERR_CODE_RECV_BUF_PROT	0x103
+#define KDBR_ERR_CODE_INV_ADDR		0x104
+#define KDBR_ERR_CODE_INV_CONN_ID	0x105
+#define KDBR_ERR_CODE_NO_PEER		0x106
+
+struct kdbr_completion {
+	int connection_id;
+	unsigned long req_id;
+	int status; /* 0 = Success */
+};
+
+#define KDBR_PORT_IOC_MAGIC	0xBB
+
+#define KDBR_PORT_OPEN_CONN	_IOR(KDBR_PORT_IOC_MAGIC, 0, \
+				     struct kdbr_connection)
+#define KDBR_PORT_CLOSE_CONN	_IOR(KDBR_PORT_IOC_MAGIC, 1,
int)
+#define KDBR_PORT_IOC_MAX	4
+
+#endif
+
diff --git a/kdbr_main.c b/kdbr_main.c
new file mode 100644
index 0000000..edf910c
--- /dev/null
+++ b/kdbr_main.c
@@ -0,0 +1,857 @@
+/*
+ * Kernel Data Bridge driver for Linux
+ *
+ * This module enables machines with Intel VT-x extensions to run
+virtual
+ * machines without emulation or binary translation.
+ *
+ * Copyright 2016 Red Hat, Inc.
+ * Copyright 2016 Oracle
+ *
+ * Authors:
+ *   Marcel Apfelbaum <marcel@xxxxxxxxxx>
+ *   Yuval Shaia <yuval.shaia@xxxxxxxxxx>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/mutex.h>
+#include <linux/idr.h>
+#include <linux/highmem.h>
+#include <asm/uaccess.h>
+#include "kdbr.h"
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Marcel Apfelbaum & Yuval Shaia");
+
+#define KDBR_MAX_PORTS 255
+
+struct kdbr_driver_data {
+	struct class *class;
+	struct device *dev;
+	struct cdev cdev;
+	int major;
+
+	spinlock_t lock;
+
+	DECLARE_BITMAP(port_map, KDBR_MAX_PORTS);
+	struct list_head ports;
+};
+static struct kdbr_driver_data kdbr_data;
+
+struct kdbr_completion_elem {
+	struct list_head list;
+	struct kdbr_completion comp;
+};
+
+struct sg_vec {
+	/* TODO: Replace with ring buffer */
+	struct list_head list;
+	int vlen; /* <= KDBR_MAX_IOVEC_LEN */
+	struct page *userpage[KDBR_MAX_IOVEC_LEN];
+	void *userptr[KDBR_MAX_IOVEC_LEN];
+	int connection_id;
+	unsigned long req_id;
+};
+
+struct comp_ring {
+	/* List of 'completions' for this port */
+	struct list_head list;
+	struct mutex lock;
+	wait_queue_head_t queue;
+	char data_flag;
+};
+
+struct kdbr_port {
+	struct cdev cdev;
+	struct device *dev;
+
+	/* Next port in the list, head is in the kdbr_data */
+	struct list_head list;
+
+	/* connection ids map */
+	struct idr conn_idr;
+	struct mutex conn_mutex;
+
+	/*port's global id*/
+	struct kdbr_gid gid;
+
+	/* port id - device minor */
+	int id;
+	pid_t pid;
+
+	struct comp_ring comps;
+};
+
+/* Global routing table */
+struct peer_route {
+	/* TODO: Replace with RB-tree */
+	struct list_head list;
+	struct peer_key {
+		unsigned long net_id;
+		unsigned long id;
+		unsigned long queue;
+	} peer_key;
+	struct kdbr_port *port;
+	struct kdbr_connection *conn;
+};
+struct list_head global_route;
+
+static int add_global_route(unsigned long net_id, unsigned long id,
+			    unsigned long queue, struct kdbr_port *port,
+			    struct kdbr_connection *conn)
+{
+	/* TODO: No check is made to see if it is already there since we
+	 * will change the implementation anyway */
+	struct peer_route *peer_route = kmalloc(sizeof(*peer_route),
+						GFP_KERNEL);
+	if (!peer_route) {
+		pr_info("Fail to alloc route\n");
+		return -ENOMEM;
+	}
+
+	peer_route->peer_key.net_id = net_id;
+	peer_route->peer_key.id = id;
+	peer_route->peer_key.queue = queue;
+	peer_route->port = port;
+	peer_route->conn = conn;
+
+	list_add(&peer_route->list, &global_route);
+
+	return 0;
+}
+
+static void del_global_route(struct kdbr_port *port,
+			     struct kdbr_connection *conn)
+{
+	struct peer_route *pos, *next;
+
+	list_for_each_entry_safe(pos, next, &global_route, list) {
+		if ((pos->port == port) && (pos->conn == conn)) {
+			list_del(&pos->list);
+			kfree(pos);
+			return;
+		}
+	}
+}
+
+static int get_global_route(unsigned long net_id, unsigned long id,
+			    unsigned long queue, struct kdbr_port **port,
+			    struct kdbr_connection **conn)
+{
+	struct peer_route *pos, *next;
+
+	list_for_each_entry_safe(pos, next, &global_route, list) {
+		if ((pos->peer_key.net_id == net_id) && (pos->peer_key.id
== id)
+		    && (pos->peer_key.queue == queue)) {
+			*port = pos->port;
+			*conn = pos->conn;
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static int kdbr_port_open(struct inode *inode, struct file *filp) {
+	struct kdbr_port *port;
+
+	port = container_of(inode->i_cdev, struct kdbr_port, cdev);
+	filp->private_data = port;
+
+	if (!port) {
+		pr_debug("kdbr: port open - no port data\n");
+		return -1;
+	}
+
+	if (port->id <= 0) {
+		pr_debug("kdbr: port open - bad port id %d\n", port->id);
+		return -1;
+	}
+
+	pr_info("kdbr: port opened with id %d\n", port->id);
+	return 0;
+}
+
+static int kdbr_port_release(struct inode *inode, struct file *filp) {
+	struct kdbr_port *port;
+
+	port = filp->private_data;
+	if (!port) {
+		pr_debug("kdbr: no port data\n");
+		return 0;
+	}
+
+	pr_info("kdbr port %d closed\n", port->id);
+	return 0;
+}
+
+static void kdbr_print_iovec(const struct iovec *vec, int vlen) {
+	int i;
+
+	for (i = 0; i < vlen; i++)
+		pr_debug("addr %p, len %ld", vec[i].iov_base,
vec[i].iov_len);
+
+	pr_debug("\n");
+}
+
+int post_cqe(struct kdbr_port *port, int connection_id, unsigned long
req_id,
+	     int status)
+{
+	struct kdbr_completion_elem *comp_elem;
+
+	pr_debug("post_cqe: port_id=%d, connection_id=%d, req_id=%ld,
status=0x%x\n",
+		 port->id, connection_id, req_id, status);
+
+	comp_elem = kmalloc(sizeof(struct kdbr_completion_elem),
GFP_KERNEL);
+	if (!comp_elem) {
+		pr_debug("Fail to allocate completion-event\n");
+		return -EINVAL;
+	}
+	comp_elem->comp.req_id = req_id;
+	comp_elem->comp.status = status;
+	comp_elem->comp.connection_id = connection_id;
+	mutex_lock(&port->comps.lock);
+	list_add_tail(&comp_elem->list, &port->comps.list);
+	mutex_unlock(&port->comps.lock);
+
+	port->comps.data_flag = 1;
+
+	wake_up_interruptible(&port->comps.queue);
+
+	return 0;
+}
+
+static struct kdbr_connection *get_connection(struct kdbr_port *port,
+					       int conn_id)
+{
+	struct kdbr_connection *conn;
+
+	mutex_lock(&port->conn_mutex);
+	conn = idr_find(&port->conn_idr, conn_id);
+	mutex_unlock(&port->conn_mutex);
+	if (!conn)
+		pr_debug("Fail to find connection %d for port %d\n",
+			 conn_id, port->id);
+
+	return conn;
+}
+
+static void add_sg_vec(struct kdbr_connection *conn, struct sg_vec
+*sg_vec) {
+	mutex_lock(conn->sg_vecs_mutex);
+	list_add(&sg_vec->list, conn->sg_vecs_list);
+	mutex_unlock(conn->sg_vecs_mutex);
+}
+
+static struct sg_vec *get_sg_vec(struct kdbr_connection *conn, int
+vlen) {
+	struct sg_vec *sg_vec = NULL;
+
+	mutex_lock(conn->sg_vecs_mutex);
+	if (list_empty(conn->sg_vecs_list)) {
+		pr_debug("conn %ld: No more buffers\n", conn->queue_id);
+		goto out;
+	}
+
+	sg_vec = list_last_entry(conn->sg_vecs_list, struct sg_vec, list);
+	if (!sg_vec) {
+		pr_debug("conn %ld: No more buffers\n", conn->queue_id);
+		goto out;
+	}
+
+	/* TODO: This limited implementation works only when top of the
+	 * list sg_vec has at least requested buffers.
+	 * In addition, no check is made with regards to buffers sizes */
+	if (sg_vec->vlen < vlen) {
+		pr_err("conn %ld: Top sg_vec is small (%d<%d)\n",
+		       conn->queue_id, sg_vec->vlen, vlen);
+		goto out;
+	}
+
+	list_del(&sg_vec->list);
+
+out:
+	mutex_unlock(conn->sg_vecs_mutex);
+	return sg_vec;
+}
+
+static int kdbr_port_recv(struct kdbr_port *port, struct kdbr_req *req)
+{
+	int rc, i;
+	int nr_pages;
+	int pg_offs;
+	struct kdbr_connection *conn;
+	struct sg_vec *sg_vec;
+
+	if (!req->vlen)
+		return post_cqe(port, req->connection_id, req->req_id,
+				KDBR_ERR_CODE_EMPTY_VEC);
+	kdbr_print_iovec(req->vec, req->vlen);
+
+	conn = get_connection(port, req->connection_id);
+	if (!conn)
+		return post_cqe(port, req->connection_id, req->req_id,
+				KDBR_ERR_CODE_INV_CONN_ID);
+
+	sg_vec = kmalloc(sizeof(*sg_vec), GFP_KERNEL);
+	sg_vec->vlen = req->vlen;
+	sg_vec->connection_id = req->connection_id;
+	sg_vec->req_id = req->req_id;
+
+	for (i = 0; i < sg_vec->vlen; i++) {
+		nr_pages = (req->vec[i].iov_len + PAGE_SIZE - 1) >>
PAGE_SHIFT;
+		pg_offs = (unsigned long)req->vec[i].iov_base & (PAGE_SIZE
- 1);
+
+		/* TODO: Need optimization when several buffers share the
+		 * same page */
+		rc = get_user_pages_fast((unsigned long)req-
vec[i].iov_base -
+					 pg_offs, nr_pages, 1,
+					 &sg_vec->userpage[i]);
+		if (rc != nr_pages) {
+			pr_debug("get_user_pages_fast, requested %d, got
%d\n",
+				 nr_pages, rc);
+			return post_cqe(port, req->connection_id, req-
req_id,
+					KDBR_ERR_CODE_INV_ADDR);
+		}
+
+		sg_vec->userptr[i] = kmap(sg_vec->userpage[i]) + pg_offs;
+		if (!sg_vec->userptr[i]) {
+			pr_debug("kmap = NULL\n");
+			return post_cqe(port, req->connection_id, req-
req_id,
+					KDBR_ERR_CODE_INV_ADDR);
+		}
+	}
+
+	add_sg_vec(conn, sg_vec);
+
+	return 0;
+}
+
+static int kdbr_port_send(struct kdbr_port *port, struct kdbr_req *req)
+{
+	int rc = 0, i;
+	struct kdbr_peer *peer;
+	struct kdbr_port *rport;
+	struct kdbr_connection *conn, *rconn;
+	struct sg_vec *sg_vec;
+
+	pr_debug("kdbr_port_send, conn id %d, remote net id 0x%lx,
remote id 0x%lx, remote queue %ld\n",
+		 req->connection_id, req->peer.rgid.net_id,
+		 req->peer.rgid.id, req->peer.rqueue);
+	kdbr_print_iovec(req->vec, req->vlen);
+
+	if (!req->vlen)
+		return post_cqe(port, req->connection_id, req->req_id,
+				KDBR_ERR_CODE_EMPTY_VEC);
+
+	/* Get peer attributes */
+	if (req->connection_id) {
+		conn = get_connection(port, req->connection_id);
+		if (!conn)
+			return post_cqe(port, req->connection_id, req-
req_id,
+					KDBR_ERR_CODE_INV_CONN_ID);
+		peer = &conn->peer;
+	} else {
+		peer = &req->peer;
+	}
+	rc = get_global_route(peer->rgid.net_id, peer->rgid.id, peer-
rqueue,
+			      &rport, &rconn);
+	if (rc)
+		return post_cqe(port, req->connection_id, req->req_id,
+				KDBR_ERR_CODE_NO_PEER);
+
+	/* Get next available buffers */
+	sg_vec = get_sg_vec(rconn, req->vlen);
+	if (!sg_vec)
+		return post_cqe(port, req->connection_id, req->req_id,
+				KDBR_ERR_CODE_NO_MORE_RECV_BUF);
+
+	/* Copy to peer */
+	for (i = 0; i < req->vlen; i++) {
+		rc = copy_from_user(sg_vec->userptr[i], req-
vec[i].iov_base,
+				    req->vec[i].iov_len);
+		if (rc) {
+			post_cqe(rport, sg_vec->connection_id, sg_vec-
req_id,
+				 KDBR_ERR_CODE_RECV_BUF_PROT);
+			kfree(sg_vec);
+			return post_cqe(port, req->connection_id, req-
req_id,
+					KDBR_ERR_CODE_RECV_BUF_PROT);
+		}
+
+		SetPageDirty(sg_vec->userpage[i]);
+		kunmap(sg_vec->userptr[i]);
+		put_page(sg_vec->userpage[i]);
+	}
+
+	/* Post recv completion to peer */
+	post_cqe(rport, sg_vec->connection_id, sg_vec->req_id, 0);
+
+	kfree(sg_vec);
+
+	/* Post send completion to requester */
+	return post_cqe(port, req->connection_id, req->req_id, 0); }
+
+static int kdbr_open_connection(struct kdbr_port *port,
+				struct kdbr_connection *user_conn)
+{
+	int id, ret;
+	struct kdbr_connection *conn;
+
+	conn = kmalloc(sizeof(*conn), GFP_KERNEL);
+	if (conn == NULL) {
+		pr_debug("Fail to alloc conn obj\n");
+		return -ENOMEM;
+	}
+
+	memcpy(conn, user_conn, sizeof(*conn));
+	conn->sg_vecs_list = kmalloc(sizeof(*conn->sg_vecs_list),
GFP_KERNEL);
+	INIT_LIST_HEAD(conn->sg_vecs_list);
+	conn->sg_vecs_mutex = kmalloc(sizeof(*conn->sg_vecs_mutex),
GFP_KERNEL);
+	mutex_init(conn->sg_vecs_mutex);
+
+	idr_preload(GFP_KERNEL);
+	mutex_lock(&port->conn_mutex);
+
+	id = idr_alloc(&port->conn_idr, conn, 1, 0, GFP_KERNEL);
+
+	mutex_unlock(&port->conn_mutex);
+	idr_preload_end();
+	if (id  <  0) {
+		ret = id;
+		goto err_conn;
+	}
+
+	ret = add_global_route(port->gid.net_id, port->gid.id, conn-
queue_id,
+			       port, conn);
+	if (ret) {
+		/* TODO: Undo idr_alloc */
+		ret = -ENOMEM;
+		goto err_conn;
+	}
+
+	pr_info("kdbr open conn %d, r_net_id=0x%lx, r_id=0x%lx on port
%d\n",
+		id, conn->peer.rgid.net_id, conn->peer.rgid.id, port->id);
+
+	return id;
+
+err_conn:
+	kfree(conn);
+
+	return ret;
+}
+
+static int kdbr_close_connection(struct kdbr_port *port, int conn_id) {
+	struct kdbr_connection *conn;
+	int ret;
+
+	mutex_lock(&port->conn_mutex);
+	conn = idr_find(&port->conn_idr, conn_id);
+	if (conn == NULL) {
+		ret = -ENODEV;
+		pr_debug("kdbr close connection, can't find id %d\n",
conn_id);
+		goto err;
+	}
+	del_global_route(port, conn);
+	kfree(conn->sg_vecs_list);
+	kfree(conn->sg_vecs_mutex);
+
+	idr_remove(&port->conn_idr, conn_id);
+	kfree(conn);
+
+	mutex_unlock(&port->conn_mutex);
+
+	pr_info("kdbr close conn %d, r_net_id=0x%lx, r_id=0x%lx on port
%d\n",
+		conn_id, conn->peer.rgid.net_id, conn->peer.rgid.id, port-
id);
+
+	return 0;
+
+err:
+	mutex_unlock(&port->conn_mutex);
+
+	return ret;
+}
+
+static long kdbr_port_ioctl(struct file *filp, unsigned int cmd,
+			     unsigned long arg)
+{
+	int ret, conn_id;
+	struct kdbr_connection conn;
+
+	pr_debug("kdbr driver ioctl called\n");
+
+	if (_IOC_TYPE(cmd) != KDBR_PORT_IOC_MAGIC)
+		return -ENOTTY;
+
+	if (_IOC_NR(cmd) > KDBR_PORT_IOC_MAX)
+		return -ENOTTY;
+
+	switch (cmd) {
+	case KDBR_PORT_OPEN_CONN:
+		ret = copy_from_user(&conn,
+				     (struct kdbr_connection __user *)arg,
+				     sizeof(conn));
+		if (!ret)
+			ret = kdbr_open_connection(filp->private_data,
&conn);
+		break;
+	case KDBR_PORT_CLOSE_CONN:
+		ret = get_user(conn_id,  (int __user *)arg);
+		if (!ret)
+			ret = kdbr_close_connection(filp->private_data,
+						    conn_id);
+		break;
+	default:
+		return -ENOTTY;
+	}
+
+	return ret;
+}
+
+ssize_t kdbr_port_read(struct file *file, char __user *buf, size_t size,
+		       loff_t *ppos)
+{
+	struct kdbr_completion_elem *comp_elem, *next;
+	int rc;
+	size_t sz = 0;
+	struct kdbr_port *port = file->private_data;
+
+	wait_event_interruptible(port->comps.queue, port-
comps.data_flag);
+
+	mutex_lock(&port->comps.lock);
+	list_for_each_entry_safe(comp_elem, next, &port->comps.list, list) {
+		if (sz + sizeof(comp_elem->comp) > size)
+			goto out;
+
+		pr_debug("kdbr_port_read: req_id=%ld, status=%d\n",
+			 comp_elem->comp.req_id, comp_elem-
comp.status);
+		rc = copy_to_user(buf + sz, &comp_elem->comp,
+				  sizeof(comp_elem->comp));
+		if (rc < 0) {
+			pr_warn("Fail to copy to user buffer, rc=%d\n", rc);
+			goto out;
+		}
+
+		sz += sizeof(comp_elem->comp);
+		list_del(&comp_elem->list);
+		kfree(comp_elem);
+	}
+
+out:
+	if (list_empty(&port->comps.list))
+		port->comps.data_flag = 0;
+	mutex_unlock(&port->comps.lock);
+
+	return sz;
+}
+
+ssize_t kdbr_port_write(struct file *file, const char __user *buf, size_t size,
+		        loff_t *ppos)
+{
+	int rc, sz = 0;
+	struct kdbr_req req;
+
+	while (1) {
+		if (sz + sizeof(req) > size)
+			goto out;
+
+		rc = copy_from_user(&req,
+				    (struct kdbr_req __user *)(buf + sz),
+				    sizeof(req));
+		if (rc) {
+			pr_debug("Fail to copy from user buf, pos=%d\n",
sz);
+			return sz;
+		}
+
+		if ((req.flags & KDBR_REQ_SIGNATURE) !=
KDBR_REQ_SIGNATURE) {
+			pr_debug("Invalid message signature 0x%x\n",
+				 req.flags & KDBR_REQ_SIGNATURE);
+			return sz;
+		}
+
+		if ((req.flags & KDBR_REQ_POST_RECV) ==
KDBR_REQ_POST_RECV)
+			rc = kdbr_port_recv(file->private_data, &req);
+
+		if ((req.flags & KDBR_REQ_POST_SEND) ==
KDBR_REQ_POST_SEND)
+			rc = kdbr_port_send(file->private_data, &req);
+
+		sz += sizeof(req);
+	}
+
+out:
+	return sz;
+}
+
+static const struct file_operations kdbr_port_ops = {
+	.owner		= THIS_MODULE,
+	.unlocked_ioctl	= kdbr_port_ioctl,
+	.open		= kdbr_port_open,
+	.release	= kdbr_port_release,
+	.read		= kdbr_port_read,
+	.write		= kdbr_port_write,
+};
+
+static int kdbr_conn_idr_cleanup(int id, void *p, void *data) {
+	kfree((struct kdbr_connection *)p);
+
+	return 0;
+}
+
+static void kdbr_delete_port(struct kdbr_port *port) {
+	spin_lock_irq(&kdbr_data.lock);
+
+	list_del(&port->list);
+	clear_bit(port->id, kdbr_data.port_map);
+
+	spin_unlock_irq(&kdbr_data.lock);
+
+	idr_for_each(&port->conn_idr, kdbr_conn_idr_cleanup, NULL);
+	idr_destroy(&port->conn_idr);
+}
+
+static void kdbr_destroy_device(struct kdbr_port *port) {
+	device_destroy(kdbr_data.class, port->cdev.dev);
+	cdev_del(&port->cdev);
+	kfree(port);
+}
+
+static int kdbr_unregister_port(int id) {
+	struct kdbr_port *port = NULL, *port2 = NULL;
+
+	if (id <= 0 || id > KDBR_MAX_PORTS) {
+		pr_debug("kdbr: unregister device, bad port id %d\n", port-
id);
+		return -EINVAL;
+	}
+
+	list_for_each_entry_safe(port, port2, &kdbr_data.ports, list) {
+		if (port->id == id) {
+			pr_info("Unregistered device on port %d\n", port-
id);
+			kdbr_delete_port(port);
+			kdbr_destroy_device(port);
+			return 0;
+		}
+	}
+
+	return -ENODEV;
+}
+
+static int kdbr_register_port(struct kdbr_reg *reg) {
+	struct kdbr_port *port;
+	dev_t devt;
+	int id;
+	int ret;
+
+	port = kmalloc(sizeof(*port), GFP_KERNEL);
+	if (!port) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	spin_lock_irq(&kdbr_data.lock);
+
+	id = find_first_zero_bit(kdbr_data.port_map, KDBR_MAX_PORTS);
+	if (id == KDBR_MAX_PORTS) {
+		spin_unlock_irq(&kdbr_data.lock);
+		ret = -ENOSPC;
+		goto fail_port;
+	}
+
+	set_bit(id, kdbr_data.port_map);
+	port->id = id;
+	port->pid = current->pid;
+	port->gid.net_id = reg->gid.net_id;
+	port->gid.id = reg->gid.id;
+	list_add_tail(&port->list, &kdbr_data.ports);
+
+	reg->port = id;
+
+	INIT_LIST_HEAD(&port->comps.list);
+	mutex_init(&port->comps.lock);
+	port->comps.data_flag = 0;
+	init_waitqueue_head(&port->comps.queue);
+
+	spin_unlock_irq(&kdbr_data.lock);
+
+	mutex_init(&port->conn_mutex);
+	idr_init(&port->conn_idr);
+
+	cdev_init(&port->cdev, &kdbr_port_ops);
+	port->cdev.owner = THIS_MODULE;
+	devt = MKDEV(kdbr_data.major, id);
+	ret = cdev_add(&port->cdev, devt, 1);
+	if (ret < 0) {
+		pr_debug("Error %d adding cdev for kdbr port %d\n", ret, id);
+		goto fail_cdev;
+	}
+
+	port->dev = device_create(kdbr_data.class, NULL, devt, port,
"kdbr%d",
+				  id);
+	if (IS_ERR(port->dev)) {
+		ret = PTR_ERR(port->dev);
+		pr_debug("Error %d creating device for kdbr port %d\n", ret,
+			 id);
+		goto fail_cdev;
+	}
+
+	pr_info("Registered device with gid [0x%lx,0x%lx] on port %d major
%d\n",
+		port->gid.net_id, port->gid.id, port->id,
+		kdbr_data.major);
+
+	return 0;
+
+fail_cdev:
+	cdev_del(&port->cdev);
+	kdbr_delete_port(port);
+
+fail_port:
+	kfree(port);
+
+fail:
+	return ret;
+}
+
+static long kdbr_ioctl(struct file *filp, unsigned int cmd, unsigned
+long arg) {
+	int ret, port;
+	struct kdbr_reg reg;
+
+	pr_debug("kdbr driver ioctl called\n");
+
+	if (_IOC_TYPE(cmd) != KDBR_IOC_MAGIC)
+		return -ENOTTY;
+
+	if (_IOC_NR(cmd) > KDBR_IOC_MAX)
+		return -ENOTTY;
+
+	switch (cmd) {
+	case KDBR_REGISTER_PORT:
+		ret = copy_from_user(&reg, (struct kdbr_reg __user *)arg,
+				     sizeof(reg));
+		if (ret)
+			return -EFAULT;
+
+		ret = kdbr_register_port(&reg);
+		if (!ret)
+			ret = copy_to_user((struct kdbr_reg __user *)arg,
+					   &reg, sizeof(reg));
+
+		break;
+	case KDBR_UNREGISTER_PORT:
+		ret = get_user(port, (int __user *)arg);
+		if (!ret)
+			ret = kdbr_unregister_port(port);
+		break;
+	default:
+		return -ENOTTY;
+	}
+
+	return ret;
+}
+
+int kdbr_release(struct inode *inode, struct file *filp) {
+	return 0;
+}
+
+static const struct file_operations kdbr_ops = {
+	.owner		= THIS_MODULE,
+	.unlocked_ioctl	= kdbr_ioctl,
+	.open		= nonseekable_open,
+	.release	= kdbr_release,
+};
+
+static int __init kdbr_init(void)
+{
+	dev_t devt;
+	int ret;
+
+	ret = alloc_chrdev_region(&devt, 0, KDBR_MAX_PORTS, "kdbr");
+	if (ret < 0) {
+		pr_debug("Error %d allocating chrdev region for kdbr\n",
ret);
+		return ret;
+	}
+	kdbr_data.major = MAJOR(devt);
+
+	cdev_init(&kdbr_data.cdev, &kdbr_ops);
+	kdbr_data.cdev.owner = THIS_MODULE;
+	ret = cdev_add(&kdbr_data.cdev, devt, 1);
+	if (ret < 0) {
+		pr_debug("Error %d adding cdev for kdbr\n", ret);
+		goto fail_chrdev;
+	}
+
+	kdbr_data.class = class_create(THIS_MODULE, "kdbr");
+	if (IS_ERR(kdbr_data.class)) {
+		ret = PTR_ERR(kdbr_data.class);
+		pr_debug("Error %d creating kdbr-class\n", ret);
+		goto fail_cdev;
+	}
+
+	kdbr_data.dev = device_create(kdbr_data.class, NULL, devt, NULL,
+				      "kdbr");
+	if (IS_ERR(kdbr_data.dev)) {
+		ret = PTR_ERR(kdbr_data.dev);
+		pr_debug("Error %d creating kdbr device\n", ret);
+		goto fail_class;
+	}
+
+	spin_lock_init(&kdbr_data.lock);
+	INIT_LIST_HEAD(&kdbr_data.ports);
+
+	INIT_LIST_HEAD(&global_route);
+
+	/* minor 0 is used by the kdbr device */
+	set_bit(0, kdbr_data.port_map);
+
+	pr_info("kdbr driver loaded\n");
+	return 0;
+
+
+fail_class:
+	class_destroy(kdbr_data.class);
+
+fail_cdev:
+	cdev_del(&kdbr_data.cdev);
+
+fail_chrdev:
+	unregister_chrdev_region(devt, KDBR_MAX_PORTS);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kdbr_init);
+
+static void __exit kdbr_exit(void)
+{
+	struct kdbr_port *port = NULL, *port2 = NULL;
+
+	list_for_each_entry_safe(port, port2, &kdbr_data.ports, list) {
+		kdbr_delete_port(port);
+		kdbr_destroy_device(port);
+	}
+
+	device_destroy(kdbr_data.class, MKDEV(kdbr_data.major, 0));
+	class_destroy(kdbr_data.class);
+	cdev_del(&kdbr_data.cdev);
+	unregister_chrdev_region(MKDEV(kdbr_data.major, 0),
KDBR_MAX_PORTS);
+
+	pr_info("kdbr driver unloaded\n");
+}
+EXPORT_SYMBOL_GPL(kdbr_exit);
+
+module_init(kdbr_init);
+module_exit(kdbr_exit);
diff --git a/kdbr_test.c b/kdbr_test.c
new file mode 100644
index 0000000..d288826
--- /dev/null
+++ b/kdbr_test.c
@@ -0,0 +1,193 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <linux/ioctl.h>
+#include <string.h>
+#include <termios.h>
+#include <errno.h>
+#include "kdbr.h"
+
+static int register_port(int kdbr_fd, struct kdbr_reg *reg) {
+    int ret;
+
+    printf("Registering port net_id 0x%lx, id 0x%lx\n", reg->gid.net_id,
+           reg->gid.id);
+    ret = ioctl(kdbr_fd, KDBR_REGISTER_PORT, reg);
+    if (ret == -1) {
+        fprintf(stderr, "KDBR_REGISTER_DEVICE failed: %s\n", strerror(ret));
+        return ret;
+    }
+
+    printf("Registered to kdbr port %d\n", reg->port);
+
+    return 0;
+}
+
+static int unregister_port(int kdbr_fd, int port) {
+    int ret;
+
+    printf("kdbr unregister port at port %d\n", port);
+    ret = ioctl(kdbr_fd, KDBR_UNREGISTER_PORT, &port);
+    if (ret == -1)
+        fprintf(stderr, "KDBR_UNREGISTER_DEVICE failed: %s\n",
+ strerror(ret));
+
+    return ret;
+}
+
+static int connect(int port_fd, struct kdbr_connection *conn) {
+    int ret;
+
+    printf("Connecting queue 0x%lx to net_id 0x%lx id 0x%lx rqueue
0x%lx\n",
+           conn->queue_id, conn->peer.rgid.net_id, conn->peer.rgid.id,
+	   conn->peer.rqueue);
+
+    ret = ioctl(port_fd, KDBR_PORT_OPEN_CONN, conn);
+    if (ret == -1) {
+        fprintf(stderr, "KDBR_PORT_OPEN_CONN failed: %s\n", strerror(ret));
+        return ret;
+    }
+
+    printf("Connected (conn_id %d)\n", ret);
+
+    return ret;
+}
+
+static int disconnect(int port_fd, int conn_id) {
+    int ret;
+
+    ret = ioctl(port_fd, KDBR_PORT_CLOSE_CONN, &conn_id);
+    if (ret == -1) {
+        fprintf(stderr, "KDBR_PORT_CLOSE_CONN failed: conn %d %s\n",
+		conn_id, strerror(ret));
+        return ret;
+    }
+
+    printf("kdbr closed connection %d\n", conn_id);
+    return 0;
+}
+
+int main(int argc, char **argv)
+{
+    int kdbr_fd, port_fd, err, conn_id, opt, sender = 0;
+    char kdbr_port_name[80] = {0};
+    struct kdbr_req sreq;
+    struct kdbr_connection conn = {0};
+    struct kdbr_reg reg = {0};
+    char *buf = malloc(20);
+
+    /* Open kdbr device file */
+    kdbr_fd = open(KDBR_FILE_NAME, 0);
+    if (kdbr_fd < 0) {
+        printf("Can't open device file: %s\n", KDBR_FILE_NAME);
+        exit(-1);
+    }
+    printf("kdbr fd opened\n");
+
+    while ((opt = getopt (argc, argv, "sg:p:q:r:")) != -1) {
+        switch (opt) {
+        case 's':  /* sender */
+            sender = 1;
+            break;
+	case 'g':  /* gid */
+	    reg.gid.id = atoi(optarg);
+	    break;
+	case 'p': /* remote gid */
+	    conn.peer.rgid.id = atoi(optarg);
+	    break;
+	case 'q':   /* queue id */
+            conn.queue_id = atoi(optarg);
+	    break;
+	case 'r':  /* remote queue id */
+	    conn.peer.rqueue = atoi(optarg);
+	    break;
+        default:
+            exit(1);
+        }
+    }
+
+    /* Register our port (net_id and id) */
+    err = register_port(kdbr_fd, &reg);
+    if (err)
+        goto fail_kdbr_fd;
+
+    /* Open the new port's device file */
+    sprintf(kdbr_port_name, KDBR_FILE_NAME "%d", reg.port);
+    port_fd = open(kdbr_port_name, O_RDWR);
+    if (port_fd < 0) {
+        err = port_fd;
+        printf("Can't open port file: %s%d, error %d\n", KDBR_FILE_NAME,
+               reg.port, errno);
+        goto fail_kdbr_fd;
+    }
+
+    /* Connect to peer */
+    conn_id = connect(port_fd, &conn);
+    if (conn_id < 0)
+        goto fail_port;
+
+    if (!sender) {
+        /* Register buffer for receive  */
+        struct kdbr_req rreq;
+
+        rreq.vec[0].iov_base = buf;
+        rreq.vec[0].iov_len = 20;
+        rreq.vlen = 1;
+	rreq.connection_id = conn_id;
+
+	rreq.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_RECV;
+	err = write(port_fd, &rreq, sizeof(rreq));
+	if (err < 0) {
+		printf("write: err=%d, errno=%d\n", err, errno);
+		goto fail_conn;
+	}
+        printf("Buffer registered, press any key to continue\n");
+        getchar();
+        printf("Message received buf='%s'\n", buf);
+    } else {
+        /* Send buffer to peer */
+        strcpy(buf, "Hello world!!!");
+        sreq.vec[0].iov_base = buf;
+        sreq.vec[0].iov_len = 20;
+        sreq.vlen = 1;
+	sreq.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_SEND;
+	sreq.connection_id = conn_id;
+        printf("Press any key to send message\n");
+        getchar();
+	err = write(port_fd, &sreq, sizeof(sreq));
+	if (err < 0) {
+		printf("write: err=%d, errno=%d\n", err, errno);
+		goto fail_conn;
+	}
+    }
+
+    disconnect(port_fd, conn_id);
+    close(port_fd);
+
+    unregister_port(kdbr_fd, reg.port);
+    close(kdbr_fd);
+    printf("kdbr fd and port %d closed\n", reg.port);
+    err = 0;
+    goto out;
+
+fail_conn:
+    disconnect(port_fd, conn_id);
+
+fail_port:
+    close(port_fd);
+
+fail_kdbr_fd:
+    close(kdbr_fd);
+    printf("kdbr fd closed\n");
+
+out:
+    free(buf);
+    return err;
+
+}
+
--
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux