Re: [Qemu-devel] [PATCH RFC] hw/pvrdma: Proposal of a new pvrdma device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 30, 2017 at 02:12:21PM +0300, Marcel Apfelbaum wrote:
> From: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
>
>  Hi,
>
>  General description
>  ===================
>  This is a very early RFC of a new RoCE emulated device
>  that enables guests to use the RDMA stack without having
>  a real hardware in the host.
>
>  The current implementation supports only VM to VM communication
>  on the same host.
>  Down the road we plan to make possible to be able to support
>  inter-machine communication by utilizing physical RoCE devices
>  or Soft RoCE.
>
>  The goals are:
>  - Reach fast and secure loos-less Inter-VM data exchange.
>  - Support remote VMs or bare metal machines.
>  - Allow VMs migration.
>  - Do not require to pin all VM memory.
>
>
>  Objective
>  =========
>  Have a QEMU implementation of the PVRDMA device. We aim to do so without
>  any change in the PVRDMA guest driver which is already merged into the
>  upstream kernel.
>
>
>  RFC status
>  ===========
>  The project is in early development stages and supports
>  only basic send/receive operations.
>
>  We present it so we can get feedbacks on design,
>  feature demands and to receive comments from the
>  community pointing us to the "right" direction.

If to judge by the feedback which you got from RDMA community
for kernel proposal [1], this community failed to understand:
1. Why do you need new module?
2. Why existing solutions are not enough and can't be extended?
3. Why RXE (SoftRoCE) can't be extended to perform this inter-VM
   communication via virtual NIC?

Can you please help us to fill this knowledge gap?

[1] http://marc.info/?l=linux-rdma&m=149063626907175&w=2

Thanks

>
>  What does work:
>   - Tested with a basic unit-test:
>     - https://github.com/yuvalshaia/kibpingpong .
>   It works fine with two devices on a single VM, has
>   some issue between two VMs in the same host.
>
>
>  Design
>  ======
>  - Follows the behavior of VMware's pvrdma device, however is not tightly
>    coupled with it and most of the code can be reused if we decide to
>    continue to a Virtio based RDMA device.
>
>  - It exposes 3 BARs:
>     BAR 0 - MSIX, utilize 3 vectors for command ring, async events and
>             completions
>     BAR 1 - Configuration of registers
>     BAR 2 - UAR, used to pass HW commands from driver.
>
>  - The device performs internal management of the RDMA
>    resources (PDs, CQs, QPs, ...), meaning the objects
>    are not directly coupled to a physical RDMA device resources.
>
>  - As backend, the pvrdma device uses KDBR, a new kernel module which
>    is also in RFC phase, read more on the linux-rdma list:
>      - https://www.spinics.net/lists/linux-rdma/msg47951.html
>
>  - All RDMA operations are converted to KDBR module calls which performs
>    the actual transfer between VMs, or, in the future,
>    will utilize a RoCE device (either physical or soft) to be able
>    to communicate with another host.
>
>
> Roadmap (out of order)
> ======================
>  - Utilize the RoCE host driver in order to support peers on external hosts.
>  - Re-use the code for a virtio based device.
>
> Any ideas, comments or suggestions would be highly appreciated.
>
> Thanks,
> Yuval Shaia & Marcel Apfelbaum
>
> Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> (Mainly design, coding was done by Yuval)
> Signed-off-by: Marcel Apfelbaum <marcel@xxxxxxxxxx>
>
> ---
>  hw/net/Makefile.objs            |   5 +
>  hw/net/pvrdma/kdbr.h            | 104 +++++++
>  hw/net/pvrdma/pvrdma-uapi.h     | 261 ++++++++++++++++
>  hw/net/pvrdma/pvrdma.h          | 155 ++++++++++
>  hw/net/pvrdma/pvrdma_cmd.c      | 322 +++++++++++++++++++
>  hw/net/pvrdma/pvrdma_defs.h     | 301 ++++++++++++++++++
>  hw/net/pvrdma/pvrdma_dev_api.h  | 342 ++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_ib_verbs.h | 469 ++++++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_kdbr.c     | 395 ++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_kdbr.h     |  53 ++++
>  hw/net/pvrdma/pvrdma_main.c     | 667 ++++++++++++++++++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_qp_ops.c   | 174 +++++++++++
>  hw/net/pvrdma/pvrdma_qp_ops.h   |  25 ++
>  hw/net/pvrdma/pvrdma_ring.c     | 127 ++++++++
>  hw/net/pvrdma/pvrdma_ring.h     |  43 +++
>  hw/net/pvrdma/pvrdma_rm.c       | 529 +++++++++++++++++++++++++++++++
>  hw/net/pvrdma/pvrdma_rm.h       | 214 +++++++++++++
>  hw/net/pvrdma/pvrdma_types.h    |  37 +++
>  hw/net/pvrdma/pvrdma_utils.c    |  36 +++
>  hw/net/pvrdma/pvrdma_utils.h    |  49 +++
>  include/hw/pci/pci_ids.h        |   3 +
>  21 files changed, 4311 insertions(+)
>  create mode 100644 hw/net/pvrdma/kdbr.h
>  create mode 100644 hw/net/pvrdma/pvrdma-uapi.h
>  create mode 100644 hw/net/pvrdma/pvrdma.h
>  create mode 100644 hw/net/pvrdma/pvrdma_cmd.c
>  create mode 100644 hw/net/pvrdma/pvrdma_defs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_dev_api.h
>  create mode 100644 hw/net/pvrdma/pvrdma_ib_verbs.h
>  create mode 100644 hw/net/pvrdma/pvrdma_kdbr.c
>  create mode 100644 hw/net/pvrdma/pvrdma_kdbr.h
>  create mode 100644 hw/net/pvrdma/pvrdma_main.c
>  create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.c
>  create mode 100644 hw/net/pvrdma/pvrdma_qp_ops.h
>  create mode 100644 hw/net/pvrdma/pvrdma_ring.c
>  create mode 100644 hw/net/pvrdma/pvrdma_ring.h
>  create mode 100644 hw/net/pvrdma/pvrdma_rm.c
>  create mode 100644 hw/net/pvrdma/pvrdma_rm.h
>  create mode 100644 hw/net/pvrdma/pvrdma_types.h
>  create mode 100644 hw/net/pvrdma/pvrdma_utils.c
>  create mode 100644 hw/net/pvrdma/pvrdma_utils.h
>
> diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
> index 610ed3e..a962347 100644
> --- a/hw/net/Makefile.objs
> +++ b/hw/net/Makefile.objs
> @@ -43,3 +43,8 @@ common-obj-$(CONFIG_ROCKER) += rocker/rocker.o rocker/rocker_fp.o \
>                                 rocker/rocker_desc.o rocker/rocker_world.o \
>                                 rocker/rocker_of_dpa.o
>  obj-$(call lnot,$(CONFIG_ROCKER)) += rocker/qmp-norocker.o
> +
> +obj-$(CONFIG_PCI) += pvrdma/pvrdma_ring.o pvrdma/pvrdma_rm.o \
> +		     pvrdma/pvrdma_utils.o pvrdma/pvrdma_qp_ops.o \
> +		     pvrdma/pvrdma_kdbr.o pvrdma/pvrdma_cmd.o \
> +		     pvrdma/pvrdma_main.o
> diff --git a/hw/net/pvrdma/kdbr.h b/hw/net/pvrdma/kdbr.h
> new file mode 100644
> index 0000000..97cb93c
> --- /dev/null
> +++ b/hw/net/pvrdma/kdbr.h
> @@ -0,0 +1,104 @@
> +/*
> + * Kernel Data Bridge driver - API
> + *
> + * Copyright 2016 Red Hat, Inc.
> + * Copyright 2016 Oracle
> + *
> + * Authors:
> + *   Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *   Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef _KDBR_H
> +#define _KDBR_H
> +
> +#ifdef __KERNEL__
> +#include <linux/uio.h>
> +#define KDBR_MAX_IOVEC_LEN    UIO_FASTIOV
> +#else
> +#include <sys/uio.h>
> +#define KDBR_MAX_IOVEC_LEN    8
> +#endif
> +
> +#define KDBR_FILE_NAME "/dev/kdbr"
> +#define KDBR_MAX_PORTS 255
> +
> +#define KDBR_IOC_MAGIC 0xBA
> +
> +#define KDBR_REGISTER_PORT    _IOWR(KDBR_IOC_MAGIC, 0, struct kdbr_reg)
> +#define KDBR_UNREGISTER_PORT    _IOW(KDBR_IOC_MAGIC, 1, int)
> +#define KDBR_IOC_MAX        2
> +
> +
> +enum kdbr_ack_type {
> +    KDBR_ACK_IMMEDIATE,
> +    KDBR_ACK_DELAYED,
> +};
> +
> +struct kdbr_gid {
> +    unsigned long net_id;
> +    unsigned long id;
> +};
> +
> +struct kdbr_peer {
> +    struct kdbr_gid rgid;
> +    unsigned long rqueue;
> +};
> +
> +struct list_head;
> +struct mutex;
> +struct kdbr_connection {
> +    unsigned long queue_id;
> +    struct kdbr_peer peer;
> +    enum kdbr_ack_type ack_type;
> +    /* TODO: hide the below fields in the .c file */
> +    struct list_head *sg_vecs_list;
> +    struct mutex *sg_vecs_mutex;
> +};
> +
> +struct kdbr_reg {
> +    struct kdbr_gid gid; /* in */
> +    int port; /* out */
> +};
> +
> +#define KDBR_REQ_SIGNATURE    0x000000AB
> +#define KDBR_REQ_POST_RECV    0x00000100
> +#define KDBR_REQ_POST_SEND    0x00000200
> +#define KDBR_REQ_POST_MREG    0x00000300
> +#define KDBR_REQ_POST_RDMA    0x00000400
> +
> +struct kdbr_req {
> +    unsigned int flags; /* 8 bits signature, 8 bits msg_type */
> +    struct iovec vec[KDBR_MAX_IOVEC_LEN];
> +    int vlen; /* <= KDBR_MAX_IOVEC_LEN */
> +    int connection_id;
> +    struct kdbr_peer peer;
> +    unsigned long req_id;
> +};
> +
> +#define KDBR_ERR_CODE_EMPTY_VEC           0x101
> +#define KDBR_ERR_CODE_NO_MORE_RECV_BUF    0x102
> +#define KDBR_ERR_CODE_RECV_BUF_PROT       0x103
> +#define KDBR_ERR_CODE_INV_ADDR            0x104
> +#define KDBR_ERR_CODE_INV_CONN_ID         0x105
> +#define KDBR_ERR_CODE_NO_PEER             0x106
> +
> +struct kdbr_completion {
> +    int connection_id;
> +    unsigned long req_id;
> +    int status; /* 0 = Success */
> +};
> +
> +#define KDBR_PORT_IOC_MAGIC    0xBB
> +
> +#define KDBR_PORT_OPEN_CONN    _IOR(KDBR_PORT_IOC_MAGIC, 0, \
> +                     struct kdbr_connection)
> +#define KDBR_PORT_CLOSE_CONN    _IOR(KDBR_PORT_IOC_MAGIC, 1, int)
> +#define KDBR_PORT_IOC_MAX    4
> +
> +#endif
> +
> diff --git a/hw/net/pvrdma/pvrdma-uapi.h b/hw/net/pvrdma/pvrdma-uapi.h
> new file mode 100644
> index 0000000..0045776
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma-uapi.h
> @@ -0,0 +1,261 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_UAPI_H
> +#define PVRDMA_UAPI_H
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <qemu/compiler.h>
> +#include <qemu/atomic.h>
> +
> +#define PVRDMA_VERSION 17
> +
> +#define PVRDMA_UAR_HANDLE_MASK    0x00FFFFFF    /* Bottom 24 bits. */
> +#define PVRDMA_UAR_QP_OFFSET    0        /* Offset of QP doorbell. */
> +#define PVRDMA_UAR_QP_SEND    BIT(30)        /* Send bit. */
> +#define PVRDMA_UAR_QP_RECV    BIT(31)        /* Recv bit. */
> +#define PVRDMA_UAR_CQ_OFFSET    4        /* Offset of CQ doorbell. */
> +#define PVRDMA_UAR_CQ_ARM_SOL    BIT(29)        /* Arm solicited bit. */
> +#define PVRDMA_UAR_CQ_ARM    BIT(30)        /* Arm bit. */
> +#define PVRDMA_UAR_CQ_POLL    BIT(31)        /* Poll bit. */
> +#define PVRDMA_INVALID_IDX    -1        /* Invalid index. */
> +
> +/* PVRDMA atomic compare and swap */
> +struct pvrdma_exp_cmp_swap {
> +    __u64 swap_val;
> +    __u64 compare_val;
> +    __u64 swap_mask;
> +    __u64 compare_mask;
> +};
> +
> +/* PVRDMA atomic fetch and add */
> +struct pvrdma_exp_fetch_add {
> +    __u64 add_val;
> +    __u64 field_boundary;
> +};
> +
> +/* PVRDMA address vector. */
> +struct pvrdma_av {
> +    __u32 port_pd;
> +    __u32 sl_tclass_flowlabel;
> +    __u8 dgid[16];
> +    __u8 src_path_bits;
> +    __u8 gid_index;
> +    __u8 stat_rate;
> +    __u8 hop_limit;
> +    __u8 dmac[6];
> +    __u8 reserved[6];
> +};
> +
> +/* PVRDMA scatter/gather entry */
> +struct pvrdma_sge {
> +    __u64   addr;
> +    __u32   length;
> +    __u32   lkey;
> +};
> +
> +/* PVRDMA receive queue work request */
> +struct pvrdma_rq_wqe_hdr {
> +    __u64 wr_id;        /* wr id */
> +    __u32 num_sge;        /* size of s/g array */
> +    __u32 total_len;    /* reserved */
> +};
> +/* Use pvrdma_sge (ib_sge) for receive queue s/g array elements. */
> +
> +/* PVRDMA send queue work request */
> +struct pvrdma_sq_wqe_hdr {
> +    __u64 wr_id;        /* wr id */
> +    __u32 num_sge;        /* size of s/g array */
> +    __u32 total_len;    /* reserved */
> +    __u32 opcode;        /* operation type */
> +    __u32 send_flags;    /* wr flags */
> +    union {
> +        __u32 imm_data;
> +        __u32 invalidate_rkey;
> +    } ex;
> +    __u32 reserved;
> +    union {
> +        struct {
> +            __u64 remote_addr;
> +            __u32 rkey;
> +            __u8 reserved[4];
> +        } rdma;
> +        struct {
> +            __u64 remote_addr;
> +            __u64 compare_add;
> +            __u64 swap;
> +            __u32 rkey;
> +            __u32 reserved;
> +        } atomic;
> +        struct {
> +            __u64 remote_addr;
> +            __u32 log_arg_sz;
> +            __u32 rkey;
> +            union {
> +                struct pvrdma_exp_cmp_swap  cmp_swap;
> +                struct pvrdma_exp_fetch_add fetch_add;
> +            } wr_data;
> +        } masked_atomics;
> +        struct {
> +            __u64 iova_start;
> +            __u64 pl_pdir_dma;
> +            __u32 page_shift;
> +            __u32 page_list_len;
> +            __u32 length;
> +            __u32 access_flags;
> +            __u32 rkey;
> +        } fast_reg;
> +        struct {
> +            __u32 remote_qpn;
> +            __u32 remote_qkey;
> +            struct pvrdma_av av;
> +        } ud;
> +    } wr;
> +};
> +/* Use pvrdma_sge (ib_sge) for send queue s/g array elements. */
> +
> +/* Completion queue element. */
> +struct pvrdma_cqe {
> +    __u64 wr_id;
> +    __u64 qp;
> +    __u32 opcode;
> +    __u32 status;
> +    __u32 byte_len;
> +    __u32 imm_data;
> +    __u32 src_qp;
> +    __u32 wc_flags;
> +    __u32 vendor_err;
> +    __u16 pkey_index;
> +    __u16 slid;
> +    __u8 sl;
> +    __u8 dlid_path_bits;
> +    __u8 port_num;
> +    __u8 smac[6];
> +    __u8 reserved2[7]; /* Pad to next power of 2 (64). */
> +};
> +
> +struct pvrdma_ring {
> +    int prod_tail;    /* Producer tail. */
> +    int cons_head;    /* Consumer head. */
> +};
> +
> +struct pvrdma_ring_state {
> +    struct pvrdma_ring tx;    /* Tx ring. */
> +    struct pvrdma_ring rx;    /* Rx ring. */
> +};
> +
> +static inline int pvrdma_idx_valid(__u32 idx, __u32 max_elems)
> +{
> +    /* Generates fewer instructions than a less-than. */
> +    return (idx & ~((max_elems << 1) - 1)) == 0;
> +}
> +
> +static inline __s32 pvrdma_idx(int *var, __u32 max_elems)
> +{
> +    unsigned int idx = atomic_read(var);
> +
> +    if (pvrdma_idx_valid(idx, max_elems)) {
> +        return idx & (max_elems - 1);
> +    }
> +    return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline void pvrdma_idx_ring_inc(int *var, __u32 max_elems)
> +{
> +    __u32 idx = atomic_read(var) + 1;    /* Increment. */
> +
> +    idx &= (max_elems << 1) - 1;        /* Modulo size, flip gen. */
> +    atomic_set(var, idx);
> +}
> +
> +static inline __s32 pvrdma_idx_ring_has_space(const struct pvrdma_ring *r,
> +                          __u32 max_elems, __u32 *out_tail)
> +{
> +    const __u32 tail = atomic_read(&r->prod_tail);
> +    const __u32 head = atomic_read(&r->cons_head);
> +
> +    if (pvrdma_idx_valid(tail, max_elems) &&
> +        pvrdma_idx_valid(head, max_elems)) {
> +        *out_tail = tail & (max_elems - 1);
> +        return tail != (head ^ max_elems);
> +    }
> +    return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline __s32 pvrdma_idx_ring_has_data(const struct pvrdma_ring *r,
> +                         __u32 max_elems, __u32 *out_head)
> +{
> +    const __u32 tail = atomic_read(&r->prod_tail);
> +    const __u32 head = atomic_read(&r->cons_head);
> +
> +    if (pvrdma_idx_valid(tail, max_elems) &&
> +        pvrdma_idx_valid(head, max_elems)) {
> +        *out_head = head & (max_elems - 1);
> +        return tail != head;
> +    }
> +    return PVRDMA_INVALID_IDX;
> +}
> +
> +static inline bool pvrdma_idx_ring_is_valid_idx(const struct pvrdma_ring *r,
> +                        __u32 max_elems, __u32 *idx)
> +{
> +    const __u32 tail = atomic_read(&r->prod_tail);
> +    const __u32 head = atomic_read(&r->cons_head);
> +
> +    if (pvrdma_idx_valid(tail, max_elems) &&
> +        pvrdma_idx_valid(head, max_elems) &&
> +        pvrdma_idx_valid(*idx, max_elems)) {
> +        if (tail > head && (*idx < tail && *idx >= head)) {
> +            return true;
> +        } else if (head > tail && (*idx >= head || *idx < tail)) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +#endif /* PVRDMA_UAPI_H */
> diff --git a/hw/net/pvrdma/pvrdma.h b/hw/net/pvrdma/pvrdma.h
> new file mode 100644
> index 0000000..d6349d4
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma.h
> @@ -0,0 +1,155 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_PVRDMA_H
> +#define PVRDMA_PVRDMA_H
> +
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +#include <hw/pci/msix.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma_defs.h>
> +#include <hw/net/pvrdma/pvrdma_dev_api.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +
> +/* BARs */
> +#define RDMA_MSIX_BAR_IDX    0
> +#define RDMA_REG_BAR_IDX     1
> +#define RDMA_UAR_BAR_IDX     2
> +#define RDMA_BAR0_MSIX_SIZE  (16 * 1024)
> +#define RDMA_BAR1_REGS_SIZE  256
> +#define RDMA_BAR2_UAR_SIZE   (16 * 1024)
> +
> +/* MSIX */
> +#define RDMA_MAX_INTRS       3
> +#define RDMA_MSIX_TABLE      0x0000
> +#define RDMA_MSIX_PBA        0x2000
> +
> +/* Interrupts Vectors */
> +#define INTR_VEC_CMD_RING            0
> +#define INTR_VEC_CMD_ASYNC_EVENTS    1
> +#define INTR_VEC_CMD_COMPLETION_Q    2
> +
> +/* HW attributes */
> +#define PVRDMA_HW_NAME       "pvrdma"
> +#define PVRDMA_HW_VERSION    17
> +#define PVRDMA_FW_VERSION    14
> +
> +/* Vendor Errors, codes 100 to FFF kept for kdbr */
> +#define VENDOR_ERR_TOO_MANY_SGES    0x201
> +#define VENDOR_ERR_NOMEM            0x202
> +#define VENDOR_ERR_FAIL_KDBR        0x203
> +
> +typedef struct HWResourceIDs {
> +    unsigned long *local_bitmap;
> +    __u32 *hw_map;
> +} HWResourceIDs;
> +
> +typedef struct DSRInfo {
> +    dma_addr_t dma;
> +    struct pvrdma_device_shared_region *dsr;
> +
> +    union pvrdma_cmd_req *req;
> +    union pvrdma_cmd_resp *rsp;
> +
> +    struct pvrdma_ring *async_ring_state;
> +    Ring async;
> +
> +    struct pvrdma_ring *cq_ring_state;
> +    Ring cq;
> +} DSRInfo;
> +
> +typedef struct PVRDMADev {
> +    PCIDevice parent_obj;
> +    MemoryRegion msix;
> +    MemoryRegion regs;
> +    __u32 regs_data[RDMA_BAR1_REGS_SIZE];
> +    MemoryRegion uar;
> +    __u32 uar_data[RDMA_BAR2_UAR_SIZE];
> +    DSRInfo dsr_info;
> +    int interrupt_mask;
> +    RmPort ports[MAX_PORTS];
> +    u64 sys_image_guid;
> +    u64 node_guid;
> +    u64 network_prefix;
> +    RmResTbl pd_tbl;
> +    RmResTbl mr_tbl;
> +    RmResTbl qp_tbl;
> +    RmResTbl cq_tbl;
> +    RmResTbl wqe_ctx_tbl;
> +} PVRDMADev;
> +#define PVRDMA_DEV(dev) OBJECT_CHECK(PVRDMADev, (dev), PVRDMA_HW_NAME)
> +
> +static inline int get_reg_val(PVRDMADev *dev, hwaddr addr, __u32 *val)
> +{
> +    int idx = addr >> 2;
> +
> +    if (idx > RDMA_BAR1_REGS_SIZE) {
> +        return -EINVAL;
> +    }
> +
> +    *val = dev->regs_data[idx];
> +
> +    return 0;
> +}
> +static inline int set_reg_val(PVRDMADev *dev, hwaddr addr, __u32 val)
> +{
> +    int idx = addr >> 2;
> +
> +    if (idx > RDMA_BAR1_REGS_SIZE) {
> +        return -EINVAL;
> +    }
> +
> +    dev->regs_data[idx] = val;
> +
> +    return 0;
> +}
> +static inline int get_uar_val(PVRDMADev *dev, hwaddr addr, __u32 *val)
> +{
> +    int idx = addr >> 2;
> +
> +    if (idx > RDMA_BAR2_UAR_SIZE) {
> +        return -EINVAL;
> +    }
> +
> +    *val = dev->uar_data[idx];
> +
> +    return 0;
> +}
> +static inline int set_uar_val(PVRDMADev *dev, hwaddr addr, __u32 val)
> +{
> +    int idx = addr >> 2;
> +
> +    if (idx > RDMA_BAR2_UAR_SIZE) {
> +        return -EINVAL;
> +    }
> +
> +    dev->uar_data[idx] = val;
> +
> +    return 0;
> +}
> +
> +static inline void post_interrupt(PVRDMADev *dev, unsigned vector)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +    if (likely(dev->interrupt_mask == 0)) {
> +        msix_notify(pci_dev, vector);
> +    }
> +}
> +
> +int execute_command(PVRDMADev *dev);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_cmd.c b/hw/net/pvrdma/pvrdma_cmd.c
> new file mode 100644
> index 0000000..ae1ef99
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_cmd.c
> @@ -0,0 +1,322 @@
> +#include "qemu/osdep.h"
> +#include "hw/hw.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pci_ids.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +
> +static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_query_port *cmd = &req->query_port;
> +    struct pvrdma_cmd_query_port_resp *resp = &rsp->query_port_resp;
> +    __u32 max_port_gids, max_port_pkeys;
> +
> +    pr_dbg("port=%d\n", cmd->port_num);
> +
> +    if (rm_get_max_port_gids(&max_port_gids) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    if (rm_get_max_port_pkeys(&max_port_pkeys) != 0) {
> +        return -ENOMEM;
> +    }
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_QUERY_PORT_RESP;
> +    resp->hdr.err = 0;
> +
> +    resp->attrs.state = PVRDMA_PORT_ACTIVE;
> +    resp->attrs.max_mtu = PVRDMA_MTU_4096;
> +    resp->attrs.active_mtu = PVRDMA_MTU_4096;
> +    resp->attrs.gid_tbl_len = max_port_gids;
> +    resp->attrs.port_cap_flags = 0;
> +    resp->attrs.max_msg_sz = 1024;
> +    resp->attrs.bad_pkey_cntr = 0;
> +    resp->attrs.qkey_viol_cntr = 0;
> +    resp->attrs.pkey_tbl_len = max_port_pkeys;
> +    resp->attrs.lid = 0;
> +    resp->attrs.sm_lid = 0;
> +    resp->attrs.lmc = 0;
> +    resp->attrs.max_vl_num = 0;
> +    resp->attrs.sm_sl = 0;
> +    resp->attrs.subnet_timeout = 0;
> +    resp->attrs.init_type_reply = 0;
> +    resp->attrs.active_width = 1;
> +    resp->attrs.active_speed = 1;
> +    resp->attrs.phys_state = 1;
> +
> +    return 0;
> +}
> +
> +static int query_pkey(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_query_pkey *cmd = &req->query_pkey;
> +    struct pvrdma_cmd_query_pkey_resp *resp = &rsp->query_pkey_resp;
> +
> +    pr_dbg("port=%d\n", cmd->port_num);
> +    pr_dbg("index=%d\n", cmd->index);
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_QUERY_PKEY_RESP;
> +    resp->hdr.err = 0;
> +
> +    resp->pkey = 0x7FFF;
> +    pr_dbg("pkey=0x%x\n", resp->pkey);
> +
> +    return 0;
> +}
> +
> +static int create_pd(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                     union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_create_pd *cmd = &req->create_pd;
> +    struct pvrdma_cmd_create_pd_resp *resp = &rsp->create_pd_resp;
> +
> +    pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0);
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_CREATE_PD_RESP;
> +    resp->hdr.err = rm_alloc_pd(dev, &resp->pd_handle, cmd->ctx_handle);
> +
> +    pr_dbg("ret=%d\n", resp->hdr.err);
> +    return resp->hdr.err;
> +}
> +
> +static int destroy_pd(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_destroy_pd *cmd = &req->destroy_pd;
> +
> +    pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> +
> +    rm_dealloc_pd(dev, cmd->pd_handle);
> +
> +    return 0;
> +}
> +
> +static int create_mr(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                     union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_create_mr *cmd = &req->create_mr;
> +    struct pvrdma_cmd_create_mr_resp *resp = &rsp->create_mr_resp;
> +
> +    pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> +    pr_dbg("access_flags=0x%x\n", cmd->access_flags);
> +    pr_dbg("flags=0x%x\n", cmd->flags);
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_CREATE_MR_RESP;
> +    resp->hdr.err = rm_alloc_mr(dev, cmd, resp);
> +
> +    pr_dbg("ret=%d\n", resp->hdr.err);
> +    return resp->hdr.err;
> +}
> +
> +static int destroy_mr(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_destroy_mr *cmd = &req->destroy_mr;
> +
> +    pr_dbg("mr_handle=%d\n", cmd->mr_handle);
> +
> +    rm_dealloc_mr(dev, cmd->mr_handle);
> +
> +    return 0;
> +}
> +
> +static int create_cq(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                     union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_create_cq *cmd = &req->create_cq;
> +    struct pvrdma_cmd_create_cq_resp *resp = &rsp->create_cq_resp;
> +
> +    pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)cmd->pdir_dma);
> +    pr_dbg("context=0x%x\n", cmd->ctx_handle ? cmd->ctx_handle : 0);
> +    pr_dbg("cqe=%d\n", cmd->cqe);
> +    pr_dbg("nchunks=%d\n", cmd->nchunks);
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_CREATE_CQ_RESP;
> +    resp->hdr.err = rm_alloc_cq(dev, cmd, resp);
> +
> +    pr_dbg("ret=%d\n", resp->hdr.err);
> +    return resp->hdr.err;
> +}
> +
> +static int destroy_cq(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_destroy_cq *cmd = &req->destroy_cq;
> +
> +    pr_dbg("cq_handle=%d\n", cmd->cq_handle);
> +
> +    rm_dealloc_cq(dev, cmd->cq_handle);
> +
> +    return 0;
> +}
> +
> +static int create_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                     union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_create_qp *cmd = &req->create_qp;
> +    struct pvrdma_cmd_create_qp_resp *resp = &rsp->create_qp_resp;
> +
> +    if (!dev->ports[0].kdbr_port) {
> +        pr_dbg("First QP, registering port 0\n");
> +        dev->ports[0].kdbr_port = kdbr_alloc_port(dev);
> +        if (!dev->ports[0].kdbr_port) {
> +            pr_dbg("Fail to register port\n");
> +            return -EIO;
> +        }
> +    }
> +
> +    pr_dbg("pd_handle=%d\n", cmd->pd_handle);
> +    pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)cmd->pdir_dma);
> +    pr_dbg("total_chunks=%d\n", cmd->total_chunks);
> +    pr_dbg("send_chunks=%d\n", cmd->send_chunks);
> +
> +    memset(resp, 0, sizeof(*resp));
> +    resp->hdr.response = cmd->hdr.response;
> +    resp->hdr.ack = PVRDMA_CMD_CREATE_QP_RESP;
> +    resp->hdr.err = rm_alloc_qp(dev, cmd, resp);
> +
> +    pr_dbg("ret=%d\n", resp->hdr.err);
> +    return resp->hdr.err;
> +}
> +
> +static int modify_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                     union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_modify_qp *cmd = &req->modify_qp;
> +
> +    pr_dbg("qp_handle=%d\n", cmd->qp_handle);
> +
> +    memset(rsp, 0, sizeof(*rsp));
> +    rsp->hdr.response = cmd->hdr.response;
> +    rsp->hdr.ack = PVRDMA_CMD_MODIFY_QP_RESP;
> +    rsp->hdr.err = rm_modify_qp(dev, cmd->qp_handle, cmd);
> +
> +    pr_dbg("ret=%d\n", rsp->hdr.err);
> +    return rsp->hdr.err;
> +}
> +
> +static int destroy_qp(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                      union pvrdma_cmd_resp *rsp)
> +{
> +    struct pvrdma_cmd_destroy_qp *cmd = &req->destroy_qp;
> +
> +    pr_dbg("qp_handle=%d\n", cmd->qp_handle);
> +
> +    rm_dealloc_qp(dev, cmd->qp_handle);
> +
> +    return 0;
> +}
> +
> +static int create_bind(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                       union pvrdma_cmd_resp *rsp)
> +{
> +    int rc;
> +    struct pvrdma_cmd_create_bind *cmd = &req->create_bind;
> +    u32 max_port_gids;
> +#ifdef DEBUG
> +    __be64 *subnet = (__be64 *)&cmd->new_gid[0];
> +    __be64 *if_id = (__be64 *)&cmd->new_gid[8];
> +#endif
> +
> +    pr_dbg("index=%d\n", cmd->index);
> +
> +    rc = rm_get_max_port_gids(&max_port_gids);
> +    if (rc) {
> +        return -EIO;
> +    }
> +
> +    if (cmd->index > max_port_gids) {
> +        return -EINVAL;
> +    }
> +
> +    pr_dbg("gid[%d]=0x%llx,0x%llx\n", cmd->index, *subnet, *if_id);
> +
> +    /* Driver forces to one port only */
> +    memcpy(dev->ports[0].gid_tbl[cmd->index].raw, &cmd->new_gid,
> +           sizeof(cmd->new_gid));
> +
> +    return 0;
> +}
> +
> +static int destroy_bind(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +                        union pvrdma_cmd_resp *rsp)
> +{
> +    /*  TODO: Check the usage of this table */
> +
> +    struct pvrdma_cmd_destroy_bind *cmd = &req->destroy_bind;
> +
> +    pr_dbg("clear index %d\n", cmd->index);
> +
> +    memset(dev->ports[0].gid_tbl[cmd->index].raw, 0,
> +           sizeof(dev->ports[0].gid_tbl[cmd->index].raw));
> +
> +    return 0;
> +}
> +
> +struct cmd_handler {
> +    __u32 cmd;
> +    int (*exec)(PVRDMADev *dev, union pvrdma_cmd_req *req,
> +            union pvrdma_cmd_resp *rsp);
> +};
> +
> +static struct cmd_handler cmd_handlers[] = {
> +    {PVRDMA_CMD_QUERY_PORT, query_port},
> +    {PVRDMA_CMD_QUERY_PKEY, query_pkey},
> +    {PVRDMA_CMD_CREATE_PD, create_pd},
> +    {PVRDMA_CMD_DESTROY_PD, destroy_pd},
> +    {PVRDMA_CMD_CREATE_MR, create_mr},
> +    {PVRDMA_CMD_DESTROY_MR, destroy_mr},
> +    {PVRDMA_CMD_CREATE_CQ, create_cq},
> +    {PVRDMA_CMD_RESIZE_CQ, NULL},
> +    {PVRDMA_CMD_DESTROY_CQ, destroy_cq},
> +    {PVRDMA_CMD_CREATE_QP, create_qp},
> +    {PVRDMA_CMD_MODIFY_QP, modify_qp},
> +    {PVRDMA_CMD_QUERY_QP, NULL},
> +    {PVRDMA_CMD_DESTROY_QP, destroy_qp},
> +    {PVRDMA_CMD_CREATE_UC, NULL},
> +    {PVRDMA_CMD_DESTROY_UC, NULL},
> +    {PVRDMA_CMD_CREATE_BIND, create_bind},
> +    {PVRDMA_CMD_DESTROY_BIND, destroy_bind},
> +};
> +
> +int execute_command(PVRDMADev *dev)
> +{
> +    int err = 0xFFFF;
> +    DSRInfo *dsr_info;
> +
> +    dsr_info = &dev->dsr_info;
> +
> +    pr_dbg("cmd=%d\n", dsr_info->req->hdr.cmd);
> +    if (dsr_info->req->hdr.cmd >= sizeof(cmd_handlers) /
> +                      sizeof(struct cmd_handler)) {
> +        pr_err("Unsupported command\n");
> +        goto out;
> +    }
> +
> +    if (!cmd_handlers[dsr_info->req->hdr.cmd].exec) {
> +        pr_err("Unsupported command (not implemented yet)\n");
> +        goto out;
> +    }
> +
> +    err = cmd_handlers[dsr_info->req->hdr.cmd].exec(dev, dsr_info->req,
> +                            dsr_info->rsp);
> +out:
> +    set_reg_val(dev, PVRDMA_REG_ERR, err);
> +    post_interrupt(dev, INTR_VEC_CMD_RING);
> +
> +    return (err == 0) ? 0 : -EINVAL;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_defs.h b/hw/net/pvrdma/pvrdma_defs.h
> new file mode 100644
> index 0000000..1d0cc11
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_defs.h
> @@ -0,0 +1,301 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_DEFS_H
> +#define PVRDMA_DEFS_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +
> +/*
> + * Masks and accessors for page directory, which is a two-level lookup:
> + * page directory -> page table -> page. Only one directory for now, but we
> + * could expand that easily. 9 bits for tables, 9 bits for pages, gives one
> + * gigabyte for memory regions and so forth.
> + */
> +
> +#define PVRDMA_PDIR_SHIFT        18
> +#define PVRDMA_PTABLE_SHIFT        9
> +#define PVRDMA_PAGE_DIR_DIR(x)        (((x) >> PVRDMA_PDIR_SHIFT) & 0x1)
> +#define PVRDMA_PAGE_DIR_TABLE(x)    (((x) >> PVRDMA_PTABLE_SHIFT) & 0x1ff)
> +#define PVRDMA_PAGE_DIR_PAGE(x)        ((x) & 0x1ff)
> +#define PVRDMA_PAGE_DIR_MAX_PAGES    (1 * 512 * 512)
> +#define PVRDMA_MAX_FAST_REG_PAGES    128
> +
> +/*
> + * Max MSI-X vectors.
> + */
> +
> +#define PVRDMA_MAX_INTERRUPTS    3
> +
> +/* Register offsets within PCI resource on BAR1. */
> +#define PVRDMA_REG_VERSION    0x00    /* R: Version of device. */
> +#define PVRDMA_REG_DSRLOW    0x04    /* W: Device shared region low PA. */
> +#define PVRDMA_REG_DSRHIGH    0x08    /* W: Device shared region high PA. */
> +#define PVRDMA_REG_CTL        0x0c    /* W: PVRDMA_DEVICE_CTL */
> +#define PVRDMA_REG_REQUEST    0x10    /* W: Indicate device request. */
> +#define PVRDMA_REG_ERR        0x14    /* R: Device error. */
> +#define PVRDMA_REG_ICR        0x18    /* R: Interrupt cause. */
> +#define PVRDMA_REG_IMR        0x1c    /* R/W: Interrupt mask. */
> +#define PVRDMA_REG_MACL        0x20    /* R/W: MAC address low. */
> +#define PVRDMA_REG_MACH        0x24    /* R/W: MAC address high. */
> +
> +/* Object flags. */
> +#define PVRDMA_CQ_FLAG_ARMED_SOL    BIT(0)    /* Armed for solicited-only. */
> +#define PVRDMA_CQ_FLAG_ARMED        BIT(1)    /* Armed. */
> +#define PVRDMA_MR_FLAG_DMA        BIT(0)    /* DMA region. */
> +#define PVRDMA_MR_FLAG_FRMR        BIT(1)    /* Fast reg memory region. */
> +
> +/*
> + * Atomic operation capability (masked versions are extended atomic
> + * operations.
> + */
> +
> +#define PVRDMA_ATOMIC_OP_COMP_SWAP    BIT(0) /* Compare and swap. */
> +#define PVRDMA_ATOMIC_OP_FETCH_ADD    BIT(1) /* Fetch and add. */
> +#define PVRDMA_ATOMIC_OP_MASK_COMP_SWAP    BIT(2) /* Masked compare and swap. */
> +#define PVRDMA_ATOMIC_OP_MASK_FETCH_ADD    BIT(3) /* Masked fetch and add. */
> +
> +/*
> + * Base Memory Management Extension flags to support Fast Reg Memory Regions
> + * and Fast Reg Work Requests. Each flag represents a verb operation and we
> + * must support all of them to qualify for the BMME device cap.
> + */
> +
> +#define PVRDMA_BMME_FLAG_LOCAL_INV    BIT(0) /* Local Invalidate. */
> +#define PVRDMA_BMME_FLAG_REMOTE_INV    BIT(1) /* Remote Invalidate. */
> +#define PVRDMA_BMME_FLAG_FAST_REG_WR    BIT(2) /* Fast Reg Work Request. */
> +
> +/*
> + * GID types. The interpretation of the gid_types bit field in the device
> + * capabilities will depend on the device mode. For now, the device only
> + * supports RoCE as mode, so only the different GID types for RoCE are
> + * defined.
> + */
> +
> +#define PVRDMA_GID_TYPE_FLAG_ROCE_V1 BIT(0)
> +#define PVRDMA_GID_TYPE_FLAG_ROCE_V2 BIT(1)
> +
> +enum pvrdma_pci_resource {
> +    PVRDMA_PCI_RESOURCE_MSIX,    /* BAR0: MSI-X, MMIO. */
> +    PVRDMA_PCI_RESOURCE_REG,    /* BAR1: Registers, MMIO. */
> +    PVRDMA_PCI_RESOURCE_UAR,    /* BAR2: UAR pages, MMIO, 64-bit. */
> +    PVRDMA_PCI_RESOURCE_LAST,    /* Last. */
> +};
> +
> +enum pvrdma_device_ctl {
> +    PVRDMA_DEVICE_CTL_ACTIVATE,    /* Activate device. */
> +    PVRDMA_DEVICE_CTL_QUIESCE,    /* Quiesce device. */
> +    PVRDMA_DEVICE_CTL_RESET,    /* Reset device. */
> +};
> +
> +enum pvrdma_intr_vector {
> +    PVRDMA_INTR_VECTOR_RESPONSE,    /* Command response. */
> +    PVRDMA_INTR_VECTOR_ASYNC,    /* Async events. */
> +    PVRDMA_INTR_VECTOR_CQ,        /* CQ notification. */
> +    /* Additional CQ notification vectors. */
> +};
> +
> +enum pvrdma_intr_cause {
> +    PVRDMA_INTR_CAUSE_RESPONSE    = (1 << PVRDMA_INTR_VECTOR_RESPONSE),
> +    PVRDMA_INTR_CAUSE_ASYNC        = (1 << PVRDMA_INTR_VECTOR_ASYNC),
> +    PVRDMA_INTR_CAUSE_CQ        = (1 << PVRDMA_INTR_VECTOR_CQ),
> +};
> +
> +enum pvrdma_intr_type {
> +    PVRDMA_INTR_TYPE_INTX,        /* Legacy. */
> +    PVRDMA_INTR_TYPE_MSI,        /* MSI. */
> +    PVRDMA_INTR_TYPE_MSIX,        /* MSI-X. */
> +};
> +
> +enum pvrdma_gos_bits {
> +    PVRDMA_GOS_BITS_UNK,        /* Unknown. */
> +    PVRDMA_GOS_BITS_32,        /* 32-bit. */
> +    PVRDMA_GOS_BITS_64,        /* 64-bit. */
> +};
> +
> +enum pvrdma_gos_type {
> +    PVRDMA_GOS_TYPE_UNK,        /* Unknown. */
> +    PVRDMA_GOS_TYPE_LINUX,        /* Linux. */
> +};
> +
> +enum pvrdma_device_mode {
> +    PVRDMA_DEVICE_MODE_ROCE,    /* RoCE. */
> +    PVRDMA_DEVICE_MODE_IWARP,    /* iWarp. */
> +    PVRDMA_DEVICE_MODE_IB,        /* InfiniBand. */
> +};
> +
> +struct pvrdma_gos_info {
> +    u32 gos_bits:2;            /* W: PVRDMA_GOS_BITS_ */
> +    u32 gos_type:4;            /* W: PVRDMA_GOS_TYPE_ */
> +    u32 gos_ver:16;            /* W: Guest OS version. */
> +    u32 gos_misc:10;        /* W: Other. */
> +    u32 pad;            /* Pad to 8-byte alignment. */
> +};
> +
> +struct pvrdma_device_caps {
> +    u64 fw_ver;                /* R: Query device. */
> +    __be64 node_guid;
> +    __be64 sys_image_guid;
> +    u64 max_mr_size;
> +    u64 page_size_cap;
> +    u64 atomic_arg_sizes;            /* EXP verbs. */
> +    u32 exp_comp_mask;            /* EXP verbs. */
> +    u32 device_cap_flags2;            /* EXP verbs. */
> +    u32 max_fa_bit_boundary;        /* EXP verbs. */
> +    u32 log_max_atomic_inline_arg;        /* EXP verbs. */
> +    u32 vendor_id;
> +    u32 vendor_part_id;
> +    u32 hw_ver;
> +    u32 max_qp;
> +    u32 max_qp_wr;
> +    u32 device_cap_flags;
> +    u32 max_sge;
> +    u32 max_sge_rd;
> +    u32 max_cq;
> +    u32 max_cqe;
> +    u32 max_mr;
> +    u32 max_pd;
> +    u32 max_qp_rd_atom;
> +    u32 max_ee_rd_atom;
> +    u32 max_res_rd_atom;
> +    u32 max_qp_init_rd_atom;
> +    u32 max_ee_init_rd_atom;
> +    u32 max_ee;
> +    u32 max_rdd;
> +    u32 max_mw;
> +    u32 max_raw_ipv6_qp;
> +    u32 max_raw_ethy_qp;
> +    u32 max_mcast_grp;
> +    u32 max_mcast_qp_attach;
> +    u32 max_total_mcast_qp_attach;
> +    u32 max_ah;
> +    u32 max_fmr;
> +    u32 max_map_per_fmr;
> +    u32 max_srq;
> +    u32 max_srq_wr;
> +    u32 max_srq_sge;
> +    u32 max_uar;
> +    u32 gid_tbl_len;
> +    u16 max_pkeys;
> +    u8  local_ca_ack_delay;
> +    u8  phys_port_cnt;
> +    u8  mode;                /* PVRDMA_DEVICE_MODE_ */
> +    u8  atomic_ops;                /* PVRDMA_ATOMIC_OP_* bits */
> +    u8  bmme_flags;                /* FRWR Mem Mgmt Extensions */
> +    u8  gid_types;                /* PVRDMA_GID_TYPE_FLAG_ */
> +    u8  reserved[4];
> +};
> +
> +struct pvrdma_ring_page_info {
> +    u32 num_pages;                /* Num pages incl. header. */
> +    u32 reserved;                /* Reserved. */
> +    u64 pdir_dma;                /* Page directory PA. */
> +};
> +
> +#pragma pack(push, 1)
> +
> +struct pvrdma_device_shared_region {
> +    u32 driver_version;            /* W: Driver version. */
> +    u32 pad;                /* Pad to 8-byte align. */
> +    struct pvrdma_gos_info gos_info;    /* W: Guest OS information. */
> +    u64 cmd_slot_dma;            /* W: Command slot address. */
> +    u64 resp_slot_dma;            /* W: Response slot address. */
> +    struct pvrdma_ring_page_info async_ring_pages;
> +                        /* W: Async ring page info. */
> +    struct pvrdma_ring_page_info cq_ring_pages;
> +                        /* W: CQ ring page info. */
> +    u32 uar_pfn;                /* W: UAR pageframe. */
> +    u32 pad2;                /* Pad to 8-byte align. */
> +    struct pvrdma_device_caps caps;        /* R: Device capabilities. */
> +};
> +
> +#pragma pack(pop)
> +
> +
> +/* Event types. Currently a 1:1 mapping with enum ib_event. */
> +enum pvrdma_eqe_type {
> +    PVRDMA_EVENT_CQ_ERR,
> +    PVRDMA_EVENT_QP_FATAL,
> +    PVRDMA_EVENT_QP_REQ_ERR,
> +    PVRDMA_EVENT_QP_ACCESS_ERR,
> +    PVRDMA_EVENT_COMM_EST,
> +    PVRDMA_EVENT_SQ_DRAINED,
> +    PVRDMA_EVENT_PATH_MIG,
> +    PVRDMA_EVENT_PATH_MIG_ERR,
> +    PVRDMA_EVENT_DEVICE_FATAL,
> +    PVRDMA_EVENT_PORT_ACTIVE,
> +    PVRDMA_EVENT_PORT_ERR,
> +    PVRDMA_EVENT_LID_CHANGE,
> +    PVRDMA_EVENT_PKEY_CHANGE,
> +    PVRDMA_EVENT_SM_CHANGE,
> +    PVRDMA_EVENT_SRQ_ERR,
> +    PVRDMA_EVENT_SRQ_LIMIT_REACHED,
> +    PVRDMA_EVENT_QP_LAST_WQE_REACHED,
> +    PVRDMA_EVENT_CLIENT_REREGISTER,
> +    PVRDMA_EVENT_GID_CHANGE,
> +};
> +
> +/* Event queue element. */
> +struct pvrdma_eqe {
> +    u32 type;    /* Event type. */
> +    u32 info;    /* Handle, other. */
> +};
> +
> +/* CQ notification queue element. */
> +struct pvrdma_cqne {
> +    u32 info;    /* Handle */
> +};
> +
> +static inline void pvrdma_init_cqe(struct pvrdma_cqe *cqe, u64 wr_id, u64 qp)
> +{
> +    memset(cqe, 0, sizeof(*cqe));
> +    cqe->status = PVRDMA_WC_GENERAL_ERR;
> +    cqe->wr_id = wr_id;
> +    cqe->qp = qp;
> +}
> +
> +#endif /* PVRDMA_DEFS_H */
> diff --git a/hw/net/pvrdma/pvrdma_dev_api.h b/hw/net/pvrdma/pvrdma_dev_api.h
> new file mode 100644
> index 0000000..4887b96
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_dev_api.h
> @@ -0,0 +1,342 @@
> +/*
> + * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of EITHER the GNU General Public License
> + * version 2 as published by the Free Software Foundation or the BSD
> + * 2-Clause License. This program is distributed in the hope that it
> + * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
> + * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU General Public License version 2 for more details at
> + * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program available in the file COPYING in the main
> + * directory of this source tree.
> + *
> + * The BSD 2-Clause License
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef PVRDMA_DEV_API_H
> +#define PVRDMA_DEV_API_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +
> +enum {
> +    PVRDMA_CMD_FIRST,
> +    PVRDMA_CMD_QUERY_PORT = PVRDMA_CMD_FIRST,
> +    PVRDMA_CMD_QUERY_PKEY,
> +    PVRDMA_CMD_CREATE_PD,
> +    PVRDMA_CMD_DESTROY_PD,
> +    PVRDMA_CMD_CREATE_MR,
> +    PVRDMA_CMD_DESTROY_MR,
> +    PVRDMA_CMD_CREATE_CQ,
> +    PVRDMA_CMD_RESIZE_CQ,
> +    PVRDMA_CMD_DESTROY_CQ,
> +    PVRDMA_CMD_CREATE_QP,
> +    PVRDMA_CMD_MODIFY_QP,
> +    PVRDMA_CMD_QUERY_QP,
> +    PVRDMA_CMD_DESTROY_QP,
> +    PVRDMA_CMD_CREATE_UC,
> +    PVRDMA_CMD_DESTROY_UC,
> +    PVRDMA_CMD_CREATE_BIND,
> +    PVRDMA_CMD_DESTROY_BIND,
> +    PVRDMA_CMD_MAX,
> +};
> +
> +enum {
> +    PVRDMA_CMD_FIRST_RESP = (1 << 31),
> +    PVRDMA_CMD_QUERY_PORT_RESP = PVRDMA_CMD_FIRST_RESP,
> +    PVRDMA_CMD_QUERY_PKEY_RESP,
> +    PVRDMA_CMD_CREATE_PD_RESP,
> +    PVRDMA_CMD_DESTROY_PD_RESP_NOOP,
> +    PVRDMA_CMD_CREATE_MR_RESP,
> +    PVRDMA_CMD_DESTROY_MR_RESP_NOOP,
> +    PVRDMA_CMD_CREATE_CQ_RESP,
> +    PVRDMA_CMD_RESIZE_CQ_RESP,
> +    PVRDMA_CMD_DESTROY_CQ_RESP_NOOP,
> +    PVRDMA_CMD_CREATE_QP_RESP,
> +    PVRDMA_CMD_MODIFY_QP_RESP,
> +    PVRDMA_CMD_QUERY_QP_RESP,
> +    PVRDMA_CMD_DESTROY_QP_RESP,
> +    PVRDMA_CMD_CREATE_UC_RESP,
> +    PVRDMA_CMD_DESTROY_UC_RESP_NOOP,
> +    PVRDMA_CMD_CREATE_BIND_RESP_NOOP,
> +    PVRDMA_CMD_DESTROY_BIND_RESP_NOOP,
> +    PVRDMA_CMD_MAX_RESP,
> +};
> +
> +struct pvrdma_cmd_hdr {
> +    u64 response;        /* Key for response lookup. */
> +    u32 cmd;        /* PVRDMA_CMD_ */
> +    u32 reserved;        /* Reserved. */
> +};
> +
> +struct pvrdma_cmd_resp_hdr {
> +    u64 response;        /* From cmd hdr. */
> +    u32 ack;        /* PVRDMA_CMD_XXX_RESP */
> +    u8 err;            /* Error. */
> +    u8 reserved[3];        /* Reserved. */
> +};
> +
> +struct pvrdma_cmd_query_port {
> +    struct pvrdma_cmd_hdr hdr;
> +    u8 port_num;
> +    u8 reserved[7];
> +};
> +
> +struct pvrdma_cmd_query_port_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    struct pvrdma_port_attr attrs;
> +};
> +
> +struct pvrdma_cmd_query_pkey {
> +    struct pvrdma_cmd_hdr hdr;
> +    u8 port_num;
> +    u8 index;
> +    u8 reserved[6];
> +};
> +
> +struct pvrdma_cmd_query_pkey_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u16 pkey;
> +    u8 reserved[6];
> +};
> +
> +struct pvrdma_cmd_create_uc {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 pfn; /* UAR page frame number */
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_uc_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 ctx_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_uc {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 ctx_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_pd {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 ctx_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_pd_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 pd_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_pd {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 pd_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_mr {
> +    struct pvrdma_cmd_hdr hdr;
> +    u64 start;
> +    u64 length;
> +    u64 pdir_dma;
> +    u32 pd_handle;
> +    u32 access_flags;
> +    u32 flags;
> +    u32 nchunks;
> +};
> +
> +struct pvrdma_cmd_create_mr_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 mr_handle;
> +    u32 lkey;
> +    u32 rkey;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_mr {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 mr_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_cq {
> +    struct pvrdma_cmd_hdr hdr;
> +    u64 pdir_dma;
> +    u32 ctx_handle;
> +    u32 cqe;
> +    u32 nchunks;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_cq_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 cq_handle;
> +    u32 cqe;
> +};
> +
> +struct pvrdma_cmd_resize_cq {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 cq_handle;
> +    u32 cqe;
> +};
> +
> +struct pvrdma_cmd_resize_cq_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 cqe;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_cq {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 cq_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_qp {
> +    struct pvrdma_cmd_hdr hdr;
> +    u64 pdir_dma;
> +    u32 pd_handle;
> +    u32 send_cq_handle;
> +    u32 recv_cq_handle;
> +    u32 srq_handle;
> +    u32 max_send_wr;
> +    u32 max_recv_wr;
> +    u32 max_send_sge;
> +    u32 max_recv_sge;
> +    u32 max_inline_data;
> +    u32 lkey;
> +    u32 access_flags;
> +    u16 total_chunks;
> +    u16 send_chunks;
> +    u16 max_atomic_arg;
> +    u8 sq_sig_all;
> +    u8 qp_type;
> +    u8 is_srq;
> +    u8 reserved[3];
> +};
> +
> +struct pvrdma_cmd_create_qp_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 qpn;
> +    u32 max_send_wr;
> +    u32 max_recv_wr;
> +    u32 max_send_sge;
> +    u32 max_recv_sge;
> +    u32 max_inline_data;
> +};
> +
> +struct pvrdma_cmd_modify_qp {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 qp_handle;
> +    u32 attr_mask;
> +    struct pvrdma_qp_attr attrs;
> +};
> +
> +struct pvrdma_cmd_query_qp {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 qp_handle;
> +    u32 attr_mask;
> +};
> +
> +struct pvrdma_cmd_query_qp_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    struct pvrdma_qp_attr attrs;
> +};
> +
> +struct pvrdma_cmd_destroy_qp {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 qp_handle;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_destroy_qp_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    u32 events_reported;
> +    u8 reserved[4];
> +};
> +
> +struct pvrdma_cmd_create_bind {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 mtu;
> +    u32 vlan;
> +    u32 index;
> +    u8 new_gid[16];
> +    u8 gid_type;
> +    u8 reserved[3];
> +};
> +
> +struct pvrdma_cmd_destroy_bind {
> +    struct pvrdma_cmd_hdr hdr;
> +    u32 index;
> +    u8 dest_gid[16];
> +    u8 reserved[4];
> +};
> +
> +union pvrdma_cmd_req {
> +    struct pvrdma_cmd_hdr hdr;
> +    struct pvrdma_cmd_query_port query_port;
> +    struct pvrdma_cmd_query_pkey query_pkey;
> +    struct pvrdma_cmd_create_uc create_uc;
> +    struct pvrdma_cmd_destroy_uc destroy_uc;
> +    struct pvrdma_cmd_create_pd create_pd;
> +    struct pvrdma_cmd_destroy_pd destroy_pd;
> +    struct pvrdma_cmd_create_mr create_mr;
> +    struct pvrdma_cmd_destroy_mr destroy_mr;
> +    struct pvrdma_cmd_create_cq create_cq;
> +    struct pvrdma_cmd_resize_cq resize_cq;
> +    struct pvrdma_cmd_destroy_cq destroy_cq;
> +    struct pvrdma_cmd_create_qp create_qp;
> +    struct pvrdma_cmd_modify_qp modify_qp;
> +    struct pvrdma_cmd_query_qp query_qp;
> +    struct pvrdma_cmd_destroy_qp destroy_qp;
> +    struct pvrdma_cmd_create_bind create_bind;
> +    struct pvrdma_cmd_destroy_bind destroy_bind;
> +};
> +
> +union pvrdma_cmd_resp {
> +    struct pvrdma_cmd_resp_hdr hdr;
> +    struct pvrdma_cmd_query_port_resp query_port_resp;
> +    struct pvrdma_cmd_query_pkey_resp query_pkey_resp;
> +    struct pvrdma_cmd_create_uc_resp create_uc_resp;
> +    struct pvrdma_cmd_create_pd_resp create_pd_resp;
> +    struct pvrdma_cmd_create_mr_resp create_mr_resp;
> +    struct pvrdma_cmd_create_cq_resp create_cq_resp;
> +    struct pvrdma_cmd_resize_cq_resp resize_cq_resp;
> +    struct pvrdma_cmd_create_qp_resp create_qp_resp;
> +    struct pvrdma_cmd_query_qp_resp query_qp_resp;
> +    struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp;
> +};
> +
> +#endif /* PVRDMA_DEV_API_H */
> diff --git a/hw/net/pvrdma/pvrdma_ib_verbs.h b/hw/net/pvrdma/pvrdma_ib_verbs.h
> new file mode 100644
> index 0000000..e2a23f3
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ib_verbs.h
> @@ -0,0 +1,469 @@
> +/*
> + * [PLEASE NOTE:  VMWARE, INC. ELECTS TO USE AND DISTRIBUTE THIS COMPONENT
> + * UNDER THE TERMS OF THE OpenIB.org BSD license.  THE ORIGINAL LICENSE TERMS
> + * ARE REPRODUCED BELOW ONLY AS A REFERENCE.]
> + *
> + * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
> + * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
> + * Copyright (c) 2004 Intel Corporation.  All rights reserved.
> + * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
> + * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
> + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
> + * Copyright (c) 2005, 2006, 2007 Cisco Systems.  All rights reserved.
> + * Copyright (c) 2015-2016 VMware, Inc.  All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#ifndef PVRDMA_IB_VERBS_H
> +#define PVRDMA_IB_VERBS_H
> +
> +#include <linux/types.h>
> +
> +union pvrdma_gid {
> +    u8    raw[16];
> +    struct {
> +        __be64    subnet_prefix;
> +        __be64    interface_id;
> +    } global;
> +};
> +
> +enum pvrdma_link_layer {
> +    PVRDMA_LINK_LAYER_UNSPECIFIED,
> +    PVRDMA_LINK_LAYER_INFINIBAND,
> +    PVRDMA_LINK_LAYER_ETHERNET,
> +};
> +
> +enum pvrdma_mtu {
> +    PVRDMA_MTU_256  = 1,
> +    PVRDMA_MTU_512  = 2,
> +    PVRDMA_MTU_1024 = 3,
> +    PVRDMA_MTU_2048 = 4,
> +    PVRDMA_MTU_4096 = 5,
> +};
> +
> +static inline int pvrdma_mtu_enum_to_int(enum pvrdma_mtu mtu)
> +{
> +    switch (mtu) {
> +    case PVRDMA_MTU_256:    return  256;
> +    case PVRDMA_MTU_512:    return  512;
> +    case PVRDMA_MTU_1024:    return 1024;
> +    case PVRDMA_MTU_2048:    return 2048;
> +    case PVRDMA_MTU_4096:    return 4096;
> +    default:        return   -1;
> +    }
> +}
> +
> +static inline enum pvrdma_mtu pvrdma_mtu_int_to_enum(int mtu)
> +{
> +    switch (mtu) {
> +    case 256:    return PVRDMA_MTU_256;
> +    case 512:    return PVRDMA_MTU_512;
> +    case 1024:    return PVRDMA_MTU_1024;
> +    case 2048:    return PVRDMA_MTU_2048;
> +    case 4096:
> +    default:    return PVRDMA_MTU_4096;
> +    }
> +}
> +
> +enum pvrdma_port_state {
> +    PVRDMA_PORT_NOP            = 0,
> +    PVRDMA_PORT_DOWN        = 1,
> +    PVRDMA_PORT_INIT        = 2,
> +    PVRDMA_PORT_ARMED        = 3,
> +    PVRDMA_PORT_ACTIVE        = 4,
> +    PVRDMA_PORT_ACTIVE_DEFER    = 5,
> +};
> +
> +enum pvrdma_port_cap_flags {
> +    PVRDMA_PORT_SM                = 1 <<  1,
> +    PVRDMA_PORT_NOTICE_SUP            = 1 <<  2,
> +    PVRDMA_PORT_TRAP_SUP            = 1 <<  3,
> +    PVRDMA_PORT_OPT_IPD_SUP            = 1 <<  4,
> +    PVRDMA_PORT_AUTO_MIGR_SUP        = 1 <<  5,
> +    PVRDMA_PORT_SL_MAP_SUP            = 1 <<  6,
> +    PVRDMA_PORT_MKEY_NVRAM            = 1 <<  7,
> +    PVRDMA_PORT_PKEY_NVRAM            = 1 <<  8,
> +    PVRDMA_PORT_LED_INFO_SUP        = 1 <<  9,
> +    PVRDMA_PORT_SM_DISABLED            = 1 << 10,
> +    PVRDMA_PORT_SYS_IMAGE_GUID_SUP        = 1 << 11,
> +    PVRDMA_PORT_PKEY_SW_EXT_PORT_TRAP_SUP    = 1 << 12,
> +    PVRDMA_PORT_EXTENDED_SPEEDS_SUP        = 1 << 14,
> +    PVRDMA_PORT_CM_SUP            = 1 << 16,
> +    PVRDMA_PORT_SNMP_TUNNEL_SUP        = 1 << 17,
> +    PVRDMA_PORT_REINIT_SUP            = 1 << 18,
> +    PVRDMA_PORT_DEVICE_MGMT_SUP        = 1 << 19,
> +    PVRDMA_PORT_VENDOR_CLASS_SUP        = 1 << 20,
> +    PVRDMA_PORT_DR_NOTICE_SUP        = 1 << 21,
> +    PVRDMA_PORT_CAP_MASK_NOTICE_SUP        = 1 << 22,
> +    PVRDMA_PORT_BOOT_MGMT_SUP        = 1 << 23,
> +    PVRDMA_PORT_LINK_LATENCY_SUP        = 1 << 24,
> +    PVRDMA_PORT_CLIENT_REG_SUP        = 1 << 25,
> +    PVRDMA_PORT_IP_BASED_GIDS        = 1 << 26,
> +    PVRDMA_PORT_CAP_FLAGS_MAX        = PVRDMA_PORT_IP_BASED_GIDS,
> +};
> +
> +enum pvrdma_port_width {
> +    PVRDMA_WIDTH_1X        = 1,
> +    PVRDMA_WIDTH_4X        = 2,
> +    PVRDMA_WIDTH_8X        = 4,
> +    PVRDMA_WIDTH_12X    = 8,
> +};
> +
> +static inline int pvrdma_width_enum_to_int(enum pvrdma_port_width width)
> +{
> +    switch (width) {
> +    case PVRDMA_WIDTH_1X:    return  1;
> +    case PVRDMA_WIDTH_4X:    return  4;
> +    case PVRDMA_WIDTH_8X:    return  8;
> +    case PVRDMA_WIDTH_12X:    return 12;
> +    default:        return -1;
> +    }
> +}
> +
> +enum pvrdma_port_speed {
> +    PVRDMA_SPEED_SDR    = 1,
> +    PVRDMA_SPEED_DDR    = 2,
> +    PVRDMA_SPEED_QDR    = 4,
> +    PVRDMA_SPEED_FDR10    = 8,
> +    PVRDMA_SPEED_FDR    = 16,
> +    PVRDMA_SPEED_EDR    = 32,
> +};
> +
> +struct pvrdma_port_attr {
> +    enum pvrdma_port_state    state;
> +    enum pvrdma_mtu        max_mtu;
> +    enum pvrdma_mtu        active_mtu;
> +    u32            gid_tbl_len;
> +    u32            port_cap_flags;
> +    u32            max_msg_sz;
> +    u32            bad_pkey_cntr;
> +    u32            qkey_viol_cntr;
> +    u16            pkey_tbl_len;
> +    u16            lid;
> +    u16            sm_lid;
> +    u8            lmc;
> +    u8            max_vl_num;
> +    u8            sm_sl;
> +    u8            subnet_timeout;
> +    u8            init_type_reply;
> +    u8            active_width;
> +    u8            active_speed;
> +    u8            phys_state;
> +    u8            reserved[2];
> +};
> +
> +struct pvrdma_global_route {
> +    union pvrdma_gid    dgid;
> +    u32            flow_label;
> +    u8            sgid_index;
> +    u8            hop_limit;
> +    u8            traffic_class;
> +    u8            reserved;
> +};
> +
> +struct pvrdma_grh {
> +    __be32            version_tclass_flow;
> +    __be16            paylen;
> +    u8            next_hdr;
> +    u8            hop_limit;
> +    union pvrdma_gid    sgid;
> +    union pvrdma_gid    dgid;
> +};
> +
> +enum pvrdma_ah_flags {
> +    PVRDMA_AH_GRH = 1,
> +};
> +
> +enum pvrdma_rate {
> +    PVRDMA_RATE_PORT_CURRENT    = 0,
> +    PVRDMA_RATE_2_5_GBPS        = 2,
> +    PVRDMA_RATE_5_GBPS        = 5,
> +    PVRDMA_RATE_10_GBPS        = 3,
> +    PVRDMA_RATE_20_GBPS        = 6,
> +    PVRDMA_RATE_30_GBPS        = 4,
> +    PVRDMA_RATE_40_GBPS        = 7,
> +    PVRDMA_RATE_60_GBPS        = 8,
> +    PVRDMA_RATE_80_GBPS        = 9,
> +    PVRDMA_RATE_120_GBPS        = 10,
> +    PVRDMA_RATE_14_GBPS        = 11,
> +    PVRDMA_RATE_56_GBPS        = 12,
> +    PVRDMA_RATE_112_GBPS        = 13,
> +    PVRDMA_RATE_168_GBPS        = 14,
> +    PVRDMA_RATE_25_GBPS        = 15,
> +    PVRDMA_RATE_100_GBPS        = 16,
> +    PVRDMA_RATE_200_GBPS        = 17,
> +    PVRDMA_RATE_300_GBPS        = 18,
> +};
> +
> +struct pvrdma_ah_attr {
> +    struct pvrdma_global_route    grh;
> +    u16                dlid;
> +    u16                vlan_id;
> +    u8                sl;
> +    u8                src_path_bits;
> +    u8                static_rate;
> +    u8                ah_flags;
> +    u8                port_num;
> +    u8                dmac[6];
> +    u8                reserved;
> +};
> +
> +enum pvrdma_wc_status {
> +    PVRDMA_WC_SUCCESS,
> +    PVRDMA_WC_LOC_LEN_ERR,
> +    PVRDMA_WC_LOC_QP_OP_ERR,
> +    PVRDMA_WC_LOC_EEC_OP_ERR,
> +    PVRDMA_WC_LOC_PROT_ERR,
> +    PVRDMA_WC_WR_FLUSH_ERR,
> +    PVRDMA_WC_MW_BIND_ERR,
> +    PVRDMA_WC_BAD_RESP_ERR,
> +    PVRDMA_WC_LOC_ACCESS_ERR,
> +    PVRDMA_WC_REM_INV_REQ_ERR,
> +    PVRDMA_WC_REM_ACCESS_ERR,
> +    PVRDMA_WC_REM_OP_ERR,
> +    PVRDMA_WC_RETRY_EXC_ERR,
> +    PVRDMA_WC_RNR_RETRY_EXC_ERR,
> +    PVRDMA_WC_LOC_RDD_VIOL_ERR,
> +    PVRDMA_WC_REM_INV_RD_REQ_ERR,
> +    PVRDMA_WC_REM_ABORT_ERR,
> +    PVRDMA_WC_INV_EECN_ERR,
> +    PVRDMA_WC_INV_EEC_STATE_ERR,
> +    PVRDMA_WC_FATAL_ERR,
> +    PVRDMA_WC_RESP_TIMEOUT_ERR,
> +    PVRDMA_WC_GENERAL_ERR,
> +};
> +
> +enum pvrdma_wc_opcode {
> +    PVRDMA_WC_SEND,
> +    PVRDMA_WC_RDMA_WRITE,
> +    PVRDMA_WC_RDMA_READ,
> +    PVRDMA_WC_COMP_SWAP,
> +    PVRDMA_WC_FETCH_ADD,
> +    PVRDMA_WC_BIND_MW,
> +    PVRDMA_WC_LSO,
> +    PVRDMA_WC_LOCAL_INV,
> +    PVRDMA_WC_FAST_REG_MR,
> +    PVRDMA_WC_MASKED_COMP_SWAP,
> +    PVRDMA_WC_MASKED_FETCH_ADD,
> +    PVRDMA_WC_RECV = 1 << 7,
> +    PVRDMA_WC_RECV_RDMA_WITH_IMM,
> +};
> +
> +enum pvrdma_wc_flags {
> +    PVRDMA_WC_GRH            = 1 << 0,
> +    PVRDMA_WC_WITH_IMM        = 1 << 1,
> +    PVRDMA_WC_WITH_INVALIDATE    = 1 << 2,
> +    PVRDMA_WC_IP_CSUM_OK        = 1 << 3,
> +    PVRDMA_WC_WITH_SMAC        = 1 << 4,
> +    PVRDMA_WC_WITH_VLAN        = 1 << 5,
> +    PVRDMA_WC_FLAGS_MAX        = PVRDMA_WC_WITH_VLAN,
> +};
> +
> +enum pvrdma_cq_notify_flags {
> +    PVRDMA_CQ_SOLICITED        = 1 << 0,
> +    PVRDMA_CQ_NEXT_COMP        = 1 << 1,
> +    PVRDMA_CQ_SOLICITED_MASK    = PVRDMA_CQ_SOLICITED |
> +                      PVRDMA_CQ_NEXT_COMP,
> +    PVRDMA_CQ_REPORT_MISSED_EVENTS    = 1 << 2,
> +};
> +
> +struct pvrdma_qp_cap {
> +    u32    max_send_wr;
> +    u32    max_recv_wr;
> +    u32    max_send_sge;
> +    u32    max_recv_sge;
> +    u32    max_inline_data;
> +    u32    reserved;
> +};
> +
> +enum pvrdma_sig_type {
> +    PVRDMA_SIGNAL_ALL_WR,
> +    PVRDMA_SIGNAL_REQ_WR,
> +};
> +
> +enum pvrdma_qp_type {
> +    PVRDMA_QPT_SMI,
> +    PVRDMA_QPT_GSI,
> +    PVRDMA_QPT_RC,
> +    PVRDMA_QPT_UC,
> +    PVRDMA_QPT_UD,
> +    PVRDMA_QPT_RAW_IPV6,
> +    PVRDMA_QPT_RAW_ETHERTYPE,
> +    PVRDMA_QPT_RAW_PACKET = 8,
> +    PVRDMA_QPT_XRC_INI = 9,
> +    PVRDMA_QPT_XRC_TGT,
> +    PVRDMA_QPT_MAX,
> +};
> +
> +enum pvrdma_qp_create_flags {
> +    PVRDMA_QP_CREATE_IPOPVRDMA_UD_LSO        = 1 << 0,
> +    PVRDMA_QP_CREATE_BLOCK_MULTICAST_LOOPBACK    = 1 << 1,
> +};
> +
> +enum pvrdma_qp_attr_mask {
> +    PVRDMA_QP_STATE            = 1 << 0,
> +    PVRDMA_QP_CUR_STATE        = 1 << 1,
> +    PVRDMA_QP_EN_SQD_ASYNC_NOTIFY    = 1 << 2,
> +    PVRDMA_QP_ACCESS_FLAGS        = 1 << 3,
> +    PVRDMA_QP_PKEY_INDEX        = 1 << 4,
> +    PVRDMA_QP_PORT            = 1 << 5,
> +    PVRDMA_QP_QKEY            = 1 << 6,
> +    PVRDMA_QP_AV            = 1 << 7,
> +    PVRDMA_QP_PATH_MTU        = 1 << 8,
> +    PVRDMA_QP_TIMEOUT        = 1 << 9,
> +    PVRDMA_QP_RETRY_CNT        = 1 << 10,
> +    PVRDMA_QP_RNR_RETRY        = 1 << 11,
> +    PVRDMA_QP_RQ_PSN        = 1 << 12,
> +    PVRDMA_QP_MAX_QP_RD_ATOMIC    = 1 << 13,
> +    PVRDMA_QP_ALT_PATH        = 1 << 14,
> +    PVRDMA_QP_MIN_RNR_TIMER        = 1 << 15,
> +    PVRDMA_QP_SQ_PSN        = 1 << 16,
> +    PVRDMA_QP_MAX_DEST_RD_ATOMIC    = 1 << 17,
> +    PVRDMA_QP_PATH_MIG_STATE    = 1 << 18,
> +    PVRDMA_QP_CAP            = 1 << 19,
> +    PVRDMA_QP_DEST_QPN        = 1 << 20,
> +    PVRDMA_QP_ATTR_MASK_MAX        = PVRDMA_QP_DEST_QPN,
> +};
> +
> +enum pvrdma_qp_state {
> +    PVRDMA_QPS_RESET,
> +    PVRDMA_QPS_INIT,
> +    PVRDMA_QPS_RTR,
> +    PVRDMA_QPS_RTS,
> +    PVRDMA_QPS_SQD,
> +    PVRDMA_QPS_SQE,
> +    PVRDMA_QPS_ERR,
> +};
> +
> +enum pvrdma_mig_state {
> +    PVRDMA_MIG_MIGRATED,
> +    PVRDMA_MIG_REARM,
> +    PVRDMA_MIG_ARMED,
> +};
> +
> +enum pvrdma_mw_type {
> +    PVRDMA_MW_TYPE_1 = 1,
> +    PVRDMA_MW_TYPE_2 = 2,
> +};
> +
> +struct pvrdma_qp_attr {
> +    enum pvrdma_qp_state    qp_state;
> +    enum pvrdma_qp_state    cur_qp_state;
> +    enum pvrdma_mtu        path_mtu;
> +    enum pvrdma_mig_state    path_mig_state;
> +    u32            qkey;
> +    u32            rq_psn;
> +    u32            sq_psn;
> +    u32            dest_qp_num;
> +    u32            qp_access_flags;
> +    u16            pkey_index;
> +    u16            alt_pkey_index;
> +    u8            en_sqd_async_notify;
> +    u8            sq_draining;
> +    u8            max_rd_atomic;
> +    u8            max_dest_rd_atomic;
> +    u8            min_rnr_timer;
> +    u8            port_num;
> +    u8            timeout;
> +    u8            retry_cnt;
> +    u8            rnr_retry;
> +    u8            alt_port_num;
> +    u8            alt_timeout;
> +    u8            reserved[5];
> +    struct pvrdma_qp_cap    cap;
> +    struct pvrdma_ah_attr    ah_attr;
> +    struct pvrdma_ah_attr    alt_ah_attr;
> +};
> +
> +enum pvrdma_wr_opcode {
> +    PVRDMA_WR_RDMA_WRITE,
> +    PVRDMA_WR_RDMA_WRITE_WITH_IMM,
> +    PVRDMA_WR_SEND,
> +    PVRDMA_WR_SEND_WITH_IMM,
> +    PVRDMA_WR_RDMA_READ,
> +    PVRDMA_WR_ATOMIC_CMP_AND_SWP,
> +    PVRDMA_WR_ATOMIC_FETCH_AND_ADD,
> +    PVRDMA_WR_LSO,
> +    PVRDMA_WR_SEND_WITH_INV,
> +    PVRDMA_WR_RDMA_READ_WITH_INV,
> +    PVRDMA_WR_LOCAL_INV,
> +    PVRDMA_WR_FAST_REG_MR,
> +    PVRDMA_WR_MASKED_ATOMIC_CMP_AND_SWP,
> +    PVRDMA_WR_MASKED_ATOMIC_FETCH_AND_ADD,
> +    PVRDMA_WR_BIND_MW,
> +    PVRDMA_WR_REG_SIG_MR,
> +};
> +
> +enum pvrdma_send_flags {
> +    PVRDMA_SEND_FENCE    = 1 << 0,
> +    PVRDMA_SEND_SIGNALED    = 1 << 1,
> +    PVRDMA_SEND_SOLICITED    = 1 << 2,
> +    PVRDMA_SEND_INLINE    = 1 << 3,
> +    PVRDMA_SEND_IP_CSUM    = 1 << 4,
> +    PVRDMA_SEND_FLAGS_MAX    = PVRDMA_SEND_IP_CSUM,
> +};
> +
> +enum pvrdma_access_flags {
> +    PVRDMA_ACCESS_LOCAL_WRITE    = 1 << 0,
> +    PVRDMA_ACCESS_REMOTE_WRITE    = 1 << 1,
> +    PVRDMA_ACCESS_REMOTE_READ    = 1 << 2,
> +    PVRDMA_ACCESS_REMOTE_ATOMIC    = 1 << 3,
> +    PVRDMA_ACCESS_MW_BIND        = 1 << 4,
> +    PVRDMA_ZERO_BASED        = 1 << 5,
> +    PVRDMA_ACCESS_ON_DEMAND        = 1 << 6,
> +    PVRDMA_ACCESS_FLAGS_MAX        = PVRDMA_ACCESS_ON_DEMAND,
> +};
> +
> +enum ib_wc_status {
> +    IB_WC_SUCCESS,
> +    IB_WC_LOC_LEN_ERR,
> +    IB_WC_LOC_QP_OP_ERR,
> +    IB_WC_LOC_EEC_OP_ERR,
> +    IB_WC_LOC_PROT_ERR,
> +    IB_WC_WR_FLUSH_ERR,
> +    IB_WC_MW_BIND_ERR,
> +    IB_WC_BAD_RESP_ERR,
> +    IB_WC_LOC_ACCESS_ERR,
> +    IB_WC_REM_INV_REQ_ERR,
> +    IB_WC_REM_ACCESS_ERR,
> +    IB_WC_REM_OP_ERR,
> +    IB_WC_RETRY_EXC_ERR,
> +    IB_WC_RNR_RETRY_EXC_ERR,
> +    IB_WC_LOC_RDD_VIOL_ERR,
> +    IB_WC_REM_INV_RD_REQ_ERR,
> +    IB_WC_REM_ABORT_ERR,
> +    IB_WC_INV_EECN_ERR,
> +    IB_WC_INV_EEC_STATE_ERR,
> +    IB_WC_FATAL_ERR,
> +    IB_WC_RESP_TIMEOUT_ERR,
> +    IB_WC_GENERAL_ERR
> +};
> +
> +#endif /* PVRDMA_IB_VERBS_H */
> diff --git a/hw/net/pvrdma/pvrdma_kdbr.c b/hw/net/pvrdma/pvrdma_kdbr.c
> new file mode 100644
> index 0000000..ec04afd
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_kdbr.c
> @@ -0,0 +1,395 @@
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +
> +#include <sys/ioctl.h>
> +
> +#include <hw/net/pvrdma/pvrdma.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +int kdbr_fd = -1;
> +
> +#define MAX_CONSEQ_CQES_READ 10
> +
> +typedef struct KdbrCtx {
> +    struct kdbr_req req;
> +    void *up_ctx;
> +    bool is_tx_req;
> +} KdbrCtx;
> +
> +static void (*tx_comp_handler)(int status, unsigned int vendor_err,
> +                               void *ctx) = 0;
> +static void (*rx_comp_handler)(int status, unsigned int vendor_err,
> +                               void *ctx) = 0;
> +
> +static void kdbr_err_to_pvrdma_err(int kdbr_status, unsigned int *status,
> +                                   unsigned int *vendor_err)
> +{
> +    if (kdbr_status == 0) {
> +        *status = IB_WC_SUCCESS;
> +        *vendor_err = 0;
> +        return;
> +    }
> +
> +    *vendor_err = kdbr_status;
> +    switch (kdbr_status) {
> +    case KDBR_ERR_CODE_EMPTY_VEC:
> +        *status = IB_WC_LOC_LEN_ERR;
> +        break;
> +    case KDBR_ERR_CODE_NO_MORE_RECV_BUF:
> +        *status = IB_WC_REM_OP_ERR;
> +        break;
> +    case KDBR_ERR_CODE_RECV_BUF_PROT:
> +        *status = IB_WC_REM_ACCESS_ERR;
> +        break;
> +    case KDBR_ERR_CODE_INV_ADDR:
> +        *status = IB_WC_LOC_ACCESS_ERR;
> +        break;
> +    case KDBR_ERR_CODE_INV_CONN_ID:
> +        *status = IB_WC_LOC_PROT_ERR;
> +        break;
> +    case KDBR_ERR_CODE_NO_PEER:
> +        *status = IB_WC_LOC_QP_OP_ERR;
> +        break;
> +    default:
> +        *status = IB_WC_GENERAL_ERR;
> +        break;
> +    }
> +}
> +
> +static void *comp_handler_thread(void *arg)
> +{
> +    KdbrPort *port = (KdbrPort *)arg;
> +    struct kdbr_completion comp[MAX_CONSEQ_CQES_READ];
> +    int i, j, rc;
> +    KdbrCtx *sctx;
> +    unsigned int status, vendor_err;
> +
> +    while (port->comp_thread.run) {
> +        rc = read(port->fd, &comp, sizeof(comp));
> +        if (unlikely(rc % sizeof(struct kdbr_completion))) {
> +            pr_err("Got unsupported message size (%d) from kdbr\n", rc);
> +            continue;
> +        }
> +        pr_dbg("Processing %ld CQEs from kdbr\n",
> +               rc / sizeof(struct kdbr_completion));
> +
> +        for (i = 0; i < rc / sizeof(struct kdbr_completion); i++) {
> +            pr_dbg("comp.req_id=%ld\n", comp[i].req_id);
> +            pr_dbg("comp.status=%d\n", comp[i].status);
> +
> +            sctx = rm_get_wqe_ctx(PVRDMA_DEV(port->dev), comp[i].req_id);
> +            if (!sctx) {
> +                pr_err("Fail to find ctx for req %ld\n", comp[i].req_id);
> +                continue;
> +            }
> +            pr_dbg("Processing %s CQE\n", sctx->is_tx_req ? "send" : "recv");
> +
> +            for (j = 0; j < sctx->req.vlen; j++) {
> +                pr_dbg("payload=%s\n", (char *)sctx->req.vec[j].iov_base);
> +                pvrdma_pci_dma_unmap(port->dev, sctx->req.vec[j].iov_base,
> +                                     sctx->req.vec[j].iov_len);
> +            }
> +
> +            kdbr_err_to_pvrdma_err(comp[i].status, &status, &vendor_err);
> +            pr_dbg("status=%d\n", status);
> +            pr_dbg("vendor_err=0x%x\n", vendor_err);
> +
> +            if (sctx->is_tx_req) {
> +                tx_comp_handler(status, vendor_err, sctx->up_ctx);
> +            } else {
> +                rx_comp_handler(status, vendor_err, sctx->up_ctx);
> +            }
> +
> +            rm_dealloc_wqe_ctx(PVRDMA_DEV(port->dev), comp[i].req_id);
> +            free(sctx);
> +        }
> +    }
> +
> +    pr_dbg("Going down\n");
> +
> +    return NULL;
> +}
> +
> +KdbrPort *kdbr_alloc_port(PVRDMADev *dev)
> +{
> +    int rc;
> +    KdbrPort *port;
> +    char name[80] = {0};
> +    struct kdbr_reg reg;
> +
> +    port = malloc(sizeof(KdbrPort));
> +    if (!port) {
> +        pr_dbg("Fail to allocate memory for port object\n");
> +        return NULL;
> +    }
> +
> +    port->dev = PCI_DEVICE(dev);
> +
> +    pr_dbg("net=0x%llx\n", dev->ports[0].gid_tbl[0].global.subnet_prefix);
> +    pr_dbg("guid=0x%llx\n", dev->ports[0].gid_tbl[0].global.interface_id);
> +    reg.gid.net_id = dev->ports[0].gid_tbl[0].global.subnet_prefix;
> +    reg.gid.id = dev->ports[0].gid_tbl[0].global.interface_id;
> +    rc = ioctl(kdbr_fd, KDBR_REGISTER_PORT, &reg);
> +    if (rc < 0) {
> +        pr_err("Fail to allocate port\n");
> +        goto err_free_port;
> +    }
> +
> +    port->num = reg.port;
> +
> +    sprintf(name, KDBR_FILE_NAME "%d", port->num);
> +    port->fd = open(name, O_RDWR);
> +    if (port->fd < 0) {
> +        pr_err("Fail to open file %s\n", name);
> +        goto err_unregister_device;
> +    }
> +
> +    sprintf(name, "pvrdma_comp_%d", port->num);
> +    port->comp_thread.run = true;
> +    qemu_thread_create(&port->comp_thread.thread, name, comp_handler_thread,
> +                       port, QEMU_THREAD_DETACHED);
> +
> +    pr_info("Port %d (fd %d) allocated\n", port->num, port->fd);
> +
> +    return port;
> +
> +err_unregister_device:
> +    ioctl(kdbr_fd, KDBR_UNREGISTER_PORT, &port->num);
> +
> +err_free_port:
> +    free(port);
> +
> +    return NULL;
> +}
> +
> +void kdbr_free_port(KdbrPort *port)
> +{
> +    int rc;
> +
> +    if (!port) {
> +        return;
> +    }
> +
> +    rc = write(port->fd, (char *)0, 1);
> +    port->comp_thread.run = false;
> +    close(port->fd);
> +
> +    rc = ioctl(kdbr_fd, KDBR_UNREGISTER_PORT, &port->num);
> +    if (rc < 0) {
> +        pr_err("Fail to allocate port\n");
> +    }
> +
> +    free(port);
> +}
> +
> +unsigned long kdbr_open_connection(KdbrPort *port, u32 qpn,
> +                                   union pvrdma_gid dgid, u32 dqpn, bool rc_qp)
> +{
> +    int rc;
> +    struct kdbr_connection connection = {0};
> +
> +    connection.queue_id = qpn;
> +    connection.peer.rgid.net_id = dgid.global.subnet_prefix;
> +    connection.peer.rgid.id = dgid.global.interface_id;
> +    connection.peer.rqueue = dqpn;
> +    connection.ack_type = rc_qp ? KDBR_ACK_DELAYED : KDBR_ACK_IMMEDIATE;
> +
> +    rc = ioctl(port->fd, KDBR_PORT_OPEN_CONN, &connection);
> +    if (rc <= 0) {
> +        pr_err("Fail to open kdbr connection on port %d fd %d err %d\n",
> +               port->num, port->fd, rc);
> +        return 0;
> +    }
> +
> +    return (unsigned long)rc;
> +}
> +
> +void kdbr_close_connection(KdbrPort *port, unsigned long connection_id)
> +{
> +    int rc;
> +
> +    rc = ioctl(port->fd, KDBR_PORT_CLOSE_CONN, &connection_id);
> +    if (rc < 0) {
> +        pr_err("Fail to close kdbr connection on port %d\n",
> +               port->num);
> +    }
> +}
> +
> +void kdbr_register_tx_comp_handler(void (*comp_handler)(int status,
> +                                   unsigned int vendor_err, void *ctx))
> +{
> +    tx_comp_handler = comp_handler;
> +}
> +
> +void kdbr_register_rx_comp_handler(void (*comp_handler)(int status,
> +                                   unsigned int vendor_err, void *ctx))
> +{
> +    rx_comp_handler = comp_handler;
> +}
> +
> +void kdbr_send_wqe(KdbrPort *port, unsigned long connection_id, bool rc_qp,
> +                   struct RmSqWqe *wqe, void *ctx)
> +{
> +    KdbrCtx *sctx;
> +    int rc;
> +    int i;
> +
> +    pr_dbg("kdbr_port=%d\n", port->num);
> +    pr_dbg("kdbr_connection_id=%ld\n", connection_id);
> +    pr_dbg("wqe->hdr.num_sge=%d\n", wqe->hdr.num_sge);
> +
> +    /* Last minute validation - verify that kdbr supports num_sge */
> +    /* TODO: Make sure this will not happen! */
> +    if (wqe->hdr.num_sge > KDBR_MAX_IOVEC_LEN) {
> +        pr_err("Error: requested %d SGEs where kdbr supports %d\n",
> +               wqe->hdr.num_sge, KDBR_MAX_IOVEC_LEN);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_TOO_MANY_SGES, ctx);
> +        return;
> +    }
> +
> +    sctx = malloc(sizeof(*sctx));
> +    if (!sctx) {
> +        pr_err("Fail to allocate kdbr request ctx\n");
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> +    }
> +
> +    memset(&sctx->req, 0, sizeof(sctx->req));
> +    sctx->req.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_SEND;
> +    sctx->req.connection_id = connection_id;
> +
> +    sctx->up_ctx = ctx;
> +    sctx->is_tx_req = 1;
> +
> +    rc = rm_alloc_wqe_ctx(PVRDMA_DEV(port->dev), &sctx->req.req_id, sctx);
> +    if (rc != 0) {
> +        pr_err("Fail to allocate request ID\n");
> +        free(sctx);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> +        return;
> +    }
> +    sctx->req.vlen = wqe->hdr.num_sge;
> +
> +    for (i = 0; i < wqe->hdr.num_sge; i++) {
> +        struct pvrdma_sge *sge;
> +
> +        sge = &wqe->sge[i];
> +
> +        pr_dbg("addr=0x%llx\n", sge->addr);
> +        pr_dbg("length=%d\n", sge->length);
> +        pr_dbg("lkey=0x%x\n", sge->lkey);
> +
> +        sctx->req.vec[i].iov_base = pvrdma_pci_dma_map(port->dev, sge->addr,
> +                                                       sge->length);
> +        sctx->req.vec[i].iov_len = sge->length;
> +    }
> +
> +    if (!rc_qp) {
> +        sctx->req.peer.rqueue = wqe->hdr.wr.ud.remote_qpn;
> +        sctx->req.peer.rgid.net_id = *((unsigned long *)
> +                        &wqe->hdr.wr.ud.av.dgid[0]);
> +        sctx->req.peer.rgid.id = *((unsigned long *)
> +                        &wqe->hdr.wr.ud.av.dgid[8]);
> +    }
> +
> +    rc = write(port->fd, &sctx->req, sizeof(sctx->req));
> +    if (rc < 0) {
> +        pr_err("Fail (%d, %d) to post send WQE to port %d, conn_id %ld\n", rc,
> +               errno, port->num, connection_id);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_FAIL_KDBR, ctx);
> +        return;
> +    }
> +}
> +
> +void kdbr_recv_wqe(KdbrPort *port, unsigned long connection_id,
> +                   struct RmRqWqe *wqe, void *ctx)
> +{
> +    KdbrCtx *sctx;
> +    int rc;
> +    int i;
> +
> +    pr_dbg("kdbr_port=%d\n", port->num);
> +    pr_dbg("kdbr_connection_id=%ld\n", connection_id);
> +    pr_dbg("wqe->hdr.num_sge=%d\n", wqe->hdr.num_sge);
> +
> +    /* Last minute validation - verify that kdbr supports num_sge */
> +    if (wqe->hdr.num_sge > KDBR_MAX_IOVEC_LEN) {
> +        pr_err("Error: requested %d SGEs where kdbr supports %d\n",
> +               wqe->hdr.num_sge, KDBR_MAX_IOVEC_LEN);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_TOO_MANY_SGES, ctx);
> +        return;
> +    }
> +
> +    sctx = malloc(sizeof(*sctx));
> +    if (!sctx) {
> +        pr_err("Fail to allocate kdbr request ctx\n");
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> +    }
> +
> +    memset(&sctx->req, 0, sizeof(sctx->req));
> +    sctx->req.flags = KDBR_REQ_SIGNATURE | KDBR_REQ_POST_RECV;
> +    sctx->req.connection_id = connection_id;
> +
> +    sctx->up_ctx = ctx;
> +    sctx->is_tx_req = 0;
> +
> +    pr_dbg("sctx=%p\n", sctx);
> +    rc = rm_alloc_wqe_ctx(PVRDMA_DEV(port->dev), &sctx->req.req_id, sctx);
> +    if (rc != 0) {
> +        pr_err("Fail to allocate request ID\n");
> +        free(sctx);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
> +        return;
> +    }
> +
> +    sctx->req.vlen = wqe->hdr.num_sge;
> +
> +    for (i = 0; i < wqe->hdr.num_sge; i++) {
> +        struct pvrdma_sge *sge;
> +
> +        sge = &wqe->sge[i];
> +
> +        pr_dbg("addr=0x%llx\n", sge->addr);
> +        pr_dbg("length=%d\n", sge->length);
> +        pr_dbg("lkey=0x%x\n", sge->lkey);
> +
> +        sctx->req.vec[i].iov_base = pvrdma_pci_dma_map(port->dev, sge->addr,
> +                                                       sge->length);
> +        sctx->req.vec[i].iov_len = sge->length;
> +    }
> +
> +    rc = write(port->fd, &sctx->req, sizeof(sctx->req));
> +    if (rc < 0) {
> +        pr_err("Fail (%d, %d) to post recv WQE to port %d, conn_id %ld\n", rc,
> +               errno, port->num, connection_id);
> +        tx_comp_handler(IB_WC_GENERAL_ERR, VENDOR_ERR_FAIL_KDBR, ctx);
> +        return;
> +    }
> +}
> +
> +static void dummy_comp_handler(int status, unsigned int vendor_err, void *ctx)
> +{
> +    pr_err("No completion handler is registered\n");
> +}
> +
> +int kdbr_init(void)
> +{
> +    kdbr_register_tx_comp_handler(dummy_comp_handler);
> +    kdbr_register_rx_comp_handler(dummy_comp_handler);
> +
> +    kdbr_fd = open(KDBR_FILE_NAME, 0);
> +    if (kdbr_fd < 0) {
> +        pr_dbg("Can't connect to kdbr, rc=%d\n", kdbr_fd);
> +        return -EIO;
> +    }
> +
> +    return 0;
> +}
> +
> +void kdbr_fini(void)
> +{
> +    close(kdbr_fd);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_kdbr.h b/hw/net/pvrdma/pvrdma_kdbr.h
> new file mode 100644
> index 0000000..293a180
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_kdbr.h
> @@ -0,0 +1,53 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA QP Operations
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_KDBR_H
> +#define PVRDMA_KDBR_H
> +
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +#include <hw/net/pvrdma/pvrdma_ib_verbs.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +typedef struct KdbrCompThread {
> +    QemuThread thread;
> +    QemuMutex mutex;
> +    bool run;
> +} KdbrCompThread;
> +
> +typedef struct KdbrPort {
> +    int num;
> +    int fd;
> +    KdbrCompThread comp_thread;
> +    PCIDevice *dev;
> +} KdbrPort;
> +
> +int kdbr_init(void);
> +void kdbr_fini(void);
> +KdbrPort *kdbr_alloc_port(PVRDMADev *dev);
> +void kdbr_free_port(KdbrPort *port);
> +void kdbr_register_tx_comp_handler(void (*comp_handler)(int status,
> +                                   unsigned int vendor_err, void *ctx));
> +void kdbr_register_rx_comp_handler(void (*comp_handler)(int status,
> +                                   unsigned int vendor_err, void *ctx));
> +unsigned long kdbr_open_connection(KdbrPort *port, u32 qpn,
> +                                   union pvrdma_gid dgid, u32 dqpn,
> +                                   bool rc_qp);
> +void kdbr_close_connection(KdbrPort *port, unsigned long connection_id);
> +void kdbr_send_wqe(KdbrPort *port, unsigned long connection_id, bool rc_qp,
> +                   struct RmSqWqe *wqe, void *ctx);
> +void kdbr_recv_wqe(KdbrPort *port, unsigned long connection_id,
> +                   struct RmRqWqe *wqe, void *ctx);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_main.c b/hw/net/pvrdma/pvrdma_main.c
> new file mode 100644
> index 0000000..5db802e
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_main.c
> @@ -0,0 +1,667 @@
> +#include <qemu/osdep.h>
> +#include <hw/hw.h>
> +#include <hw/pci/pci.h>
> +#include <hw/pci/pci_ids.h>
> +#include <hw/pci/msi.h>
> +#include <hw/pci/msix.h>
> +#include <hw/qdev-core.h>
> +#include <hw/qdev-properties.h>
> +#include <cpu.h>
> +
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_defs.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma_dev_api.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +#include "hw/net/pvrdma/pvrdma_qp_ops.h"
> +
> +static Property pvrdma_dev_properties[] = {
> +    DEFINE_PROP_UINT64("sys-image-guid", PVRDMADev, sys_image_guid, 0),
> +    DEFINE_PROP_UINT64("node-guid", PVRDMADev, node_guid, 0),
> +    DEFINE_PROP_UINT64("network-prefix", PVRDMADev, network_prefix, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void free_dev_ring(PCIDevice *pci_dev, Ring *ring, void *ring_state)
> +{
> +    ring_free(ring);
> +    pvrdma_pci_dma_unmap(pci_dev, ring_state, TARGET_PAGE_SIZE);
> +}
> +
> +static int init_dev_ring(Ring *ring, struct pvrdma_ring **ring_state,
> +                         const char *name, PCIDevice *pci_dev,
> +                         dma_addr_t dir_addr, u32 num_pages)
> +{
> +    __u64 *dir, *tbl;
> +    int rc = 0;
> +
> +    pr_dbg("Initializing device ring %s\n", name);
> +    pr_dbg("pdir_dma=0x%llx\n", (long long unsigned int)dir_addr);
> +    pr_dbg("num_pages=%d\n", num_pages);
> +    dir = pvrdma_pci_dma_map(pci_dev, dir_addr, TARGET_PAGE_SIZE);
> +    if (!dir) {
> +        pr_err("Fail to map to page directory\n");
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +    tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> +    if (!tbl) {
> +        pr_err("Fail to map to page table\n");
> +        rc = -ENOMEM;
> +        goto out_free_dir;
> +    }
> +
> +    *ring_state = pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> +    if (!*ring_state) {
> +        pr_err("Fail to map to ring state\n");
> +        rc = -ENOMEM;
> +        goto out_free_tbl;
> +    }
> +    /* RX ring is the second */
> +    (struct pvrdma_ring *)(*ring_state)++;
> +    rc = ring_init(ring, name, pci_dev, (struct pvrdma_ring *)*ring_state,
> +                   (num_pages - 1) * TARGET_PAGE_SIZE /
> +                   sizeof(struct pvrdma_cqne), sizeof(struct pvrdma_cqne),
> +                   (dma_addr_t *)&tbl[1], (dma_addr_t)num_pages - 1);
> +    if (rc != 0) {
> +        pr_err("Fail to initialize ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_ring_state;
> +    }
> +
> +    goto out_free_tbl;
> +
> +out_free_ring_state:
> +    pvrdma_pci_dma_unmap(pci_dev, *ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_tbl:
> +    pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> +
> +out_free_dir:
> +    pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> +
> +out:
> +    return rc;
> +}
> +
> +static void free_dsr(PVRDMADev *dev)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +    if (!dev->dsr_info.dsr) {
> +        return;
> +    }
> +
> +    free_dev_ring(pci_dev, &dev->dsr_info.async,
> +                  dev->dsr_info.async_ring_state);
> +
> +    free_dev_ring(pci_dev, &dev->dsr_info.cq, dev->dsr_info.cq_ring_state);
> +
> +    pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.req,
> +                         sizeof(union pvrdma_cmd_req));
> +
> +    pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.rsp,
> +                         sizeof(union pvrdma_cmd_resp));
> +
> +    pvrdma_pci_dma_unmap(pci_dev, dev->dsr_info.dsr,
> +                         sizeof(struct pvrdma_device_shared_region));
> +
> +    dev->dsr_info.dsr = NULL;
> +}
> +
> +static int load_dsr(PVRDMADev *dev)
> +{
> +    int rc = 0;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info;
> +    struct pvrdma_device_shared_region *dsr;
> +
> +    free_dsr(dev);
> +
> +    /* Map to DSR */
> +    pr_dbg("dsr_dma=0x%llx\n", (long long unsigned int)dev->dsr_info.dma);
> +    dev->dsr_info.dsr = pvrdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dev->dsr_info.dsr) {
> +        pr_err("Fail to map to DSR\n");
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +
> +    /* Shortcuts */
> +    dsr_info = &dev->dsr_info;
> +    dsr = dsr_info->dsr;
> +
> +    /* Map to command slot */
> +    pr_dbg("cmd_dma=0x%llx\n", (long long unsigned int)dsr->cmd_slot_dma);
> +    dsr_info->req = pvrdma_pci_dma_map(pci_dev, dsr->cmd_slot_dma,
> +                                       sizeof(union pvrdma_cmd_req));
> +    if (!dsr_info->req) {
> +        pr_err("Fail to map to command slot address\n");
> +        rc = -ENOMEM;
> +        goto out_free_dsr;
> +    }
> +
> +    /* Map to response slot */
> +    pr_dbg("rsp_dma=0x%llx\n", (long long unsigned int)dsr->resp_slot_dma);
> +    dsr_info->rsp = pvrdma_pci_dma_map(pci_dev, dsr->resp_slot_dma,
> +                                       sizeof(union pvrdma_cmd_resp));
> +    if (!dsr_info->rsp) {
> +        pr_err("Fail to map to response slot address\n");
> +        rc = -ENOMEM;
> +        goto out_free_req;
> +    }
> +
> +    /* Map to CQ notification ring */
> +    rc = init_dev_ring(&dsr_info->cq, &dsr_info->cq_ring_state, "dev_cq",
> +                       pci_dev, dsr->cq_ring_pages.pdir_dma,
> +                       dsr->cq_ring_pages.num_pages);
> +    if (rc != 0) {
> +        pr_err("Fail to map to initialize CQ ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_rsp;
> +    }
> +
> +    /* Map to event notification ring */
> +    rc = init_dev_ring(&dsr_info->async, &dsr_info->async_ring_state,
> +                       "dev_async", pci_dev, dsr->async_ring_pages.pdir_dma,
> +                       dsr->async_ring_pages.num_pages);
> +    if (rc != 0) {
> +        pr_err("Fail to map to initialize event ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_rsp;
> +    }
> +
> +    goto out;
> +
> +out_free_rsp:
> +    pvrdma_pci_dma_unmap(pci_dev, dsr_info->rsp, sizeof(union pvrdma_cmd_resp));
> +
> +out_free_req:
> +    pvrdma_pci_dma_unmap(pci_dev, dsr_info->req, sizeof(union pvrdma_cmd_req));
> +
> +out_free_dsr:
> +    pvrdma_pci_dma_unmap(pci_dev, dsr_info->dsr,
> +                         sizeof(struct pvrdma_device_shared_region));
> +    dsr_info->dsr = NULL;
> +
> +out:
> +    return rc;
> +}
> +
> +static void init_dev_caps(PVRDMADev *dev)
> +{
> +    struct pvrdma_device_shared_region *dsr;
> +
> +    if (dev->dsr_info.dsr == NULL) {
> +        pr_err("Can't initialized DSR\n");
> +        return;
> +    }
> +
> +    dsr = dev->dsr_info.dsr;
> +
> +    dsr->caps.fw_ver = PVRDMA_FW_VERSION;
> +    pr_dbg("fw_ver=0x%lx\n", dsr->caps.fw_ver);
> +
> +    dsr->caps.mode = PVRDMA_DEVICE_MODE_ROCE;
> +    pr_dbg("mode=%d\n", dsr->caps.mode);
> +
> +    dsr->caps.gid_types |= PVRDMA_GID_TYPE_FLAG_ROCE_V1;
> +    pr_dbg("gid_types=0x%x\n", dsr->caps.gid_types);
> +
> +    dsr->caps.max_uar = RDMA_BAR2_UAR_SIZE;
> +    pr_dbg("max_uar=%d\n", dsr->caps.max_uar);
> +
> +    if (rm_get_max_pds(&dsr->caps.max_pd)) {
> +        return;
> +    }
> +    pr_dbg("max_pd=%d\n", dsr->caps.max_pd);
> +
> +    if (rm_get_max_gids(&dsr->caps.gid_tbl_len)) {
> +        return;
> +    }
> +    pr_dbg("gid_tbl_len=%d\n", dsr->caps.gid_tbl_len);
> +
> +    if (rm_get_max_cqs(&dsr->caps.max_cq)) {
> +        return;
> +    }
> +    pr_dbg("max_cq=%d\n", dsr->caps.max_cq);
> +
> +    if (rm_get_max_cqes(&dsr->caps.max_cqe)) {
> +        return;
> +    }
> +    pr_dbg("max_cqe=%d\n", dsr->caps.max_cqe);
> +
> +    if (rm_get_max_qps(&dsr->caps.max_qp)) {
> +        return;
> +    }
> +    pr_dbg("max_qp=%d\n", dsr->caps.max_qp);
> +
> +    dsr->caps.sys_image_guid = cpu_to_be64(dev->sys_image_guid);
> +    pr_dbg("sys_image_guid=%llx\n",
> +           (long long unsigned int)be64_to_cpu(dsr->caps.sys_image_guid));
> +
> +    dsr->caps.node_guid = cpu_to_be64(dev->node_guid);
> +    pr_dbg("node_guid=%llx\n",
> +           (long long unsigned int)be64_to_cpu(dsr->caps.node_guid));
> +
> +    if (rm_get_phys_port_cnt(&dsr->caps.phys_port_cnt)) {
> +        return;
> +    }
> +    pr_dbg("phys_port_cnt=%d\n", dsr->caps.phys_port_cnt);
> +
> +    if (rm_get_max_qp_wrs(&dsr->caps.max_qp_wr)) {
> +        return;
> +    }
> +    pr_dbg("max_qp_wr=%d\n", dsr->caps.max_qp_wr);
> +
> +    if (rm_get_max_sges(&dsr->caps.max_sge)) {
> +        return;
> +    }
> +    pr_dbg("max_sge=%d\n", dsr->caps.max_sge);
> +
> +    if (rm_get_max_mrs(&dsr->caps.max_mr)) {
> +        return;
> +    }
> +    pr_dbg("max_mr=%d\n", dsr->caps.max_mr);
> +
> +    if (rm_get_max_pkeys(&dsr->caps.max_pkeys)) {
> +        return;
> +    }
> +    pr_dbg("max_pkeys=%d\n", dsr->caps.max_pkeys);
> +
> +    if (rm_get_max_ah(&dsr->caps.max_ah)) {
> +        return;
> +    }
> +    pr_dbg("max_ah=%d\n", dsr->caps.max_ah);
> +
> +    pr_dbg("Initialized\n");
> +}
> +
> +static void free_ports(PVRDMADev *dev)
> +{
> +    int i;
> +
> +    for (i = 0; i < MAX_PORTS; i++) {
> +        free(dev->ports[i].gid_tbl);
> +        kdbr_free_port(dev->ports[i].kdbr_port);
> +    }
> +}
> +
> +static int init_ports(PVRDMADev *dev)
> +{
> +    int i, ret = 0;
> +    __u32 max_port_gids;
> +    __u32 max_port_pkeys;
> +
> +    memset(dev->ports, 0, sizeof(dev->ports));
> +
> +    ret = rm_get_max_port_gids(&max_port_gids);
> +    if (ret != 0) {
> +        goto err;
> +    }
> +
> +    ret = rm_get_max_port_pkeys(&max_port_pkeys);
> +    if (ret != 0) {
> +        goto err;
> +    }
> +
> +    for (i = 0; i < MAX_PORTS; i++) {
> +        dev->ports[i].state = PVRDMA_PORT_DOWN;
> +
> +        dev->ports[i].pkey_tbl = malloc(sizeof(*dev->ports[i].pkey_tbl) *
> +                                        max_port_pkeys);
> +        if (dev->ports[i].gid_tbl == NULL) {
> +            goto err_free_ports;
> +        }
> +
> +        memset(dev->ports[i].gid_tbl, 0, sizeof(dev->ports[i].gid_tbl));
> +    }
> +
> +    return 0;
> +
> +err_free_ports:
> +    free_ports(dev);
> +
> +err:
> +    pr_err("Fail to initialize device's ports\n");
> +
> +    return ret;
> +}
> +
> +static void activate_device(PVRDMADev *dev)
> +{
> +    set_reg_val(dev, PVRDMA_REG_ERR, 0);
> +    pr_dbg("Device activated\n");
> +}
> +
> +static int quiesce_device(PVRDMADev *dev)
> +{
> +    pr_dbg("Device quiesced\n");
> +    return 0;
> +}
> +
> +static int reset_device(PVRDMADev *dev)
> +{
> +    pr_dbg("Device reset complete\n");
> +    return 0;
> +}
> +
> +static uint64_t regs_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    PVRDMADev *dev = opaque;
> +    __u32 val;
> +
> +    /* pr_dbg("addr=0x%lx, size=%d\n", addr, size); */
> +
> +    if (get_reg_val(dev, addr, &val)) {
> +        pr_dbg("Error trying to read REG value from address 0x%x\n",
> +               (__u32)addr);
> +        return -EINVAL;
> +    }
> +
> +    /* pr_dbg("regs[0x%x]=0x%x\n", (__u32)addr, val); */
> +
> +    return val;
> +}
> +
> +static void regs_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> +{
> +    PVRDMADev *dev = opaque;
> +
> +    /* pr_dbg("addr=0x%lx, val=0x%x, size=%d\n", addr, (uint32_t)val, size); */
> +
> +    if (set_reg_val(dev, addr, val)) {
> +        pr_err("Error trying to set REG value, addr=0x%x, val=0x%lx\n",
> +               (__u32)addr, val);
> +        return;
> +    }
> +
> +    /* pr_dbg("regs[0x%x]=0x%lx\n", (__u32)addr, val); */
> +
> +    switch (addr) {
> +    case PVRDMA_REG_DSRLOW:
> +        dev->dsr_info.dma = val;
> +        break;
> +    case PVRDMA_REG_DSRHIGH:
> +        dev->dsr_info.dma |= val << 32;
> +        load_dsr(dev);
> +        init_dev_caps(dev);
> +        break;
> +    case PVRDMA_REG_CTL:
> +        switch (val) {
> +        case PVRDMA_DEVICE_CTL_ACTIVATE:
> +            activate_device(dev);
> +            break;
> +        case PVRDMA_DEVICE_CTL_QUIESCE:
> +            quiesce_device(dev);
> +            break;
> +        case PVRDMA_DEVICE_CTL_RESET:
> +            reset_device(dev);
> +            break;
> +        }
> +    case PVRDMA_REG_IMR:
> +        pr_dbg("Interrupt mask=0x%lx\n", val);
> +        dev->interrupt_mask = val;
> +        break;
> +    case PVRDMA_REG_REQUEST:
> +        if (val == 0) {
> +            execute_command(dev);
> +        }
> +    default:
> +        break;
> +    }
> +}
> +
> +static const MemoryRegionOps regs_ops = {
> +    .read = regs_read,
> +    .write = regs_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .impl = {
> +        .min_access_size = sizeof(uint32_t),
> +        .max_access_size = sizeof(uint32_t),
> +    },
> +};
> +
> +static uint64_t uar_read(void *opaque, hwaddr addr, unsigned size)
> +{
> +    PVRDMADev *dev = opaque;
> +    __u32 val;
> +
> +    pr_dbg("addr=0x%lx, size=%d\n", addr, size);
> +
> +    if (get_uar_val(dev, addr, &val)) {
> +        pr_dbg("Error trying to read UAR value from address 0x%x\n",
> +               (__u32)addr);
> +        return -EINVAL;
> +    }
> +
> +    pr_dbg("uar[0x%x]=0x%x\n", (__u32)addr, val);
> +
> +    return val;
> +}
> +
> +static void uar_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> +{
> +    PVRDMADev *dev = opaque;
> +
> +    /* pr_dbg("addr=0x%lx, val=0x%x, size=%d\n", addr, (uint32_t)val, size); */
> +
> +    if (set_uar_val(dev, addr, val)) {
> +        pr_err("Error trying to set UAR value, addr=0x%x, val=0x%lx\n",
> +               (__u32)addr, val);
> +        return;
> +    }
> +
> +    /* pr_dbg("uar[0x%x]=0x%lx\n", (__u32)addr, val); */
> +
> +    switch (addr) {
> +    case PVRDMA_UAR_QP_OFFSET:
> +        pr_dbg("UAR QP command, addr=0x%x, val=0x%lx\n", (__u32)addr, val);
> +        if (val & PVRDMA_UAR_QP_SEND) {
> +            qp_send(dev, val & PVRDMA_UAR_HANDLE_MASK);
> +        }
> +        if (val & PVRDMA_UAR_QP_RECV) {
> +            qp_recv(dev, val & PVRDMA_UAR_HANDLE_MASK);
> +        }
> +        break;
> +    case PVRDMA_UAR_CQ_OFFSET:
> +        pr_dbg("UAR CQ command, addr=0x%x, val=0x%lx\n", (__u32)addr, val);
> +        rm_req_notify_cq(dev, val & PVRDMA_UAR_HANDLE_MASK,
> +                 val & ~PVRDMA_UAR_HANDLE_MASK);
> +        break;
> +    default:
> +        pr_err("Unsupported command, addr=0x%x, val=0x%lx\n", (__u32)addr, val);
> +        break;
> +    }
> +}
> +
> +static const MemoryRegionOps uar_ops = {
> +    .read = uar_read,
> +    .write = uar_write,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
> +    .impl = {
> +        .min_access_size = sizeof(uint32_t),
> +        .max_access_size = sizeof(uint32_t),
> +    },
> +};
> +
> +static void init_pci_config(PCIDevice *pdev)
> +{
> +    pdev->config[PCI_INTERRUPT_PIN] = 1;
> +}
> +
> +static void init_bars(PCIDevice *pdev)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> +    /* BAR 0 - MSI-X */
> +    memory_region_init(&dev->msix, OBJECT(dev), "pvrdma-msix",
> +                       RDMA_BAR0_MSIX_SIZE);
> +    pci_register_bar(pdev, RDMA_MSIX_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &dev->msix);
> +
> +    /* BAR 1 - Registers */
> +    memset(&dev->regs_data, 0, RDMA_BAR1_REGS_SIZE);
> +    memory_region_init_io(&dev->regs, OBJECT(dev), &regs_ops, dev,
> +                          "pvrdma-regs", RDMA_BAR1_REGS_SIZE);
> +    pci_register_bar(pdev, RDMA_REG_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &dev->regs);
> +
> +    /* BAR 2 - UAR */
> +    memset(&dev->uar_data, 0, RDMA_BAR2_UAR_SIZE);
> +    memory_region_init_io(&dev->uar, OBJECT(dev), &uar_ops, dev, "rdma-uar",
> +                          RDMA_BAR2_UAR_SIZE);
> +    pci_register_bar(pdev, RDMA_UAR_BAR_IDX, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &dev->uar);
> +}
> +
> +static void init_regs(PCIDevice *pdev)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> +    set_reg_val(dev, PVRDMA_REG_VERSION, PVRDMA_HW_VERSION);
> +    set_reg_val(dev, PVRDMA_REG_ERR, 0xFFFF);
> +}
> +
> +static void uninit_msix(PCIDevice *pdev, int used_vectors)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +    int i;
> +
> +    for (i = 0; i < used_vectors; i++) {
> +        msix_vector_unuse(pdev, i);
> +    }
> +
> +    msix_uninit(pdev, &dev->msix, &dev->msix);
> +}
> +
> +static int init_msix(PCIDevice *pdev)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +    int i;
> +    int rc;
> +
> +    rc = msix_init(pdev, RDMA_MAX_INTRS, &dev->msix, RDMA_MSIX_BAR_IDX,
> +                   RDMA_MSIX_TABLE, &dev->msix, RDMA_MSIX_BAR_IDX,
> +                   RDMA_MSIX_PBA, 0, NULL);
> +
> +    if (rc < 0) {
> +        pr_err("Fail to initialize MSI-X\n");
> +        return rc;
> +    }
> +
> +    for (i = 0; i < RDMA_MAX_INTRS; i++) {
> +        rc = msix_vector_use(PCI_DEVICE(dev), i);
> +        if (rc < 0) {
> +            pr_err("Fail mark MSI-X vercor %d\n", i);
> +            uninit_msix(pdev, i);
> +            return rc;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int pvrdma_init(PCIDevice *pdev)
> +{
> +    int rc;
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> +    pr_info("Initializing device %s %x.%x\n", pdev->name,
> +            PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +
> +    dev->dsr_info.dsr = NULL;
> +
> +    init_pci_config(pdev);
> +
> +    init_bars(pdev);
> +
> +    init_regs(pdev);
> +
> +    rc = init_msix(pdev);
> +    if (rc != 0) {
> +        goto out;
> +    }
> +
> +    rc = kdbr_init();
> +    if (rc != 0) {
> +        goto out;
> +    }
> +
> +    rc = rm_init(dev);
> +    if (rc != 0) {
> +        goto out;
> +    }
> +
> +    rc = init_ports(dev);
> +    if (rc != 0) {
> +        goto out;
> +    }
> +
> +    rc = qp_ops_init();
> +    if (rc != 0) {
> +        goto out;
> +    }
> +
> +out:
> +    if (rc != 0) {
> +        pr_err("Device fail to load\n");
> +    }
> +
> +    return rc;
> +}
> +
> +static void pvrdma_exit(PCIDevice *pdev)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(pdev);
> +
> +    pr_info("Closing device %s %x.%x\n", pdev->name,
> +            PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +
> +    qp_ops_fini();
> +
> +    free_ports(dev);
> +
> +    rm_fini(dev);
> +
> +    kdbr_fini();
> +
> +    free_dsr(dev);
> +
> +    if (msix_enabled(pdev)) {
> +        uninit_msix(pdev, RDMA_MAX_INTRS);
> +    }
> +}
> +
> +static void pvrdma_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->init = pvrdma_init;
> +    k->exit = pvrdma_exit;
> +    k->vendor_id = PCI_VENDOR_ID_VMWARE;
> +    k->device_id = PCI_DEVICE_ID_VMWARE_PVRDMA;
> +    k->revision = 0x00;
> +    k->class_id = PCI_CLASS_NETWORK_OTHER;
> +
> +    dc->desc = "RDMA Device";
> +    dc->props = pvrdma_dev_properties;
> +    set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> +}
> +
> +static const TypeInfo pvrdma_info = {
> +    .name = PVRDMA_HW_NAME,
> +    .parent    = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(PVRDMADev),
> +    .class_init = pvrdma_class_init,
> +};
> +
> +static void register_types(void)
> +{
> +    type_register_static(&pvrdma_info);
> +}
> +
> +type_init(register_types)
> diff --git a/hw/net/pvrdma/pvrdma_qp_ops.c b/hw/net/pvrdma/pvrdma_qp_ops.c
> new file mode 100644
> index 0000000..2db45d9
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_qp_ops.c
> @@ -0,0 +1,174 @@
> +#include "hw/net/pvrdma/pvrdma.h"
> +#include "hw/net/pvrdma/pvrdma_utils.h"
> +#include "hw/net/pvrdma/pvrdma_qp_ops.h"
> +#include "hw/net/pvrdma/pvrdma_rm.h"
> +#include "hw/net/pvrdma/pvrdma-uapi.h"
> +#include "hw/net/pvrdma/pvrdma_kdbr.h"
> +#include "sysemu/dma.h"
> +#include "hw/pci/pci.h"
> +
> +typedef struct CompHandlerCtx {
> +    PVRDMADev *dev;
> +    u32 cq_handle;
> +    struct pvrdma_cqe cqe;
> +} CompHandlerCtx;
> +
> +/*
> + * 1. Put CQE on send CQ ring
> + * 2. Put CQ number on dsr completion ring
> + * 3. Interrupt host
> + */
> +static int post_cqe(PVRDMADev *dev, u32 cq_handle, struct pvrdma_cqe *cqe)
> +{
> +    struct pvrdma_cqe *cqe1;
> +    struct pvrdma_cqne *cqne;
> +    RmCQ *cq = rm_get_cq(dev, cq_handle);
> +
> +    if (!cq) {
> +        pr_dbg("Invalid cqn %d\n", cq_handle);
> +        return -EINVAL;
> +    }
> +
> +    pr_dbg("cq->comp_type=%d\n", cq->comp_type);
> +    if (cq->comp_type == CCT_NONE) {
> +        return 0;
> +    }
> +    cq->comp_type = CCT_NONE;
> +
> +    /* Step #1: Put CQE on CQ ring */
> +    pr_dbg("Writing CQE\n");
> +    cqe1 = ring_next_elem_write(&cq->cq);
> +    if (!cqe1) {
> +        return -EINVAL;
> +    }
> +
> +    memcpy(cqe1, cqe, sizeof(*cqe));
> +    ring_write_inc(&cq->cq);
> +
> +    /* Step #2: Put CQ number on dsr completion ring */
> +    pr_dbg("Writing CQNE\n");
> +    cqne = ring_next_elem_write(&dev->dsr_info.cq);
> +    if (!cqne) {
> +        return -EINVAL;
> +    }
> +
> +    cqne->info = cq_handle;
> +    ring_write_inc(&dev->dsr_info.cq);
> +
> +    post_interrupt(dev, INTR_VEC_CMD_COMPLETION_Q);
> +
> +    return 0;
> +}
> +
> +static void qp_ops_comp_handler(int status, unsigned int vendor_err, void *ctx)
> +{
> +    CompHandlerCtx *comp_ctx = (CompHandlerCtx *)ctx;
> +
> +    pr_dbg("cq_handle=%d\n", comp_ctx->cq_handle);
> +    pr_dbg("wr_id=%lld\n", comp_ctx->cqe.wr_id);
> +    pr_dbg("status=%d\n", status);
> +    pr_dbg("vendor_err=0x%x\n", vendor_err);
> +    comp_ctx->cqe.status = status;
> +    comp_ctx->cqe.vendor_err = vendor_err;
> +    post_cqe(comp_ctx->dev, comp_ctx->cq_handle, &comp_ctx->cqe);
> +    free(ctx);
> +}
> +
> +void qp_ops_fini(void)
> +{
> +}
> +
> +int qp_ops_init(void)
> +{
> +    kdbr_register_tx_comp_handler(qp_ops_comp_handler);
> +    kdbr_register_rx_comp_handler(qp_ops_comp_handler);
> +
> +    return 0;
> +}
> +
> +int qp_send(PVRDMADev *dev, __u32 qp_handle)
> +{
> +    RmQP *qp;
> +    RmSqWqe *wqe;
> +
> +    qp = rm_get_qp(dev, qp_handle);
> +    if (!qp) {
> +        return -EINVAL;
> +    }
> +
> +    if (qp->qp_state < PVRDMA_QPS_RTS) {
> +        pr_dbg("Invalid QP state for send\n");
> +        return -EINVAL;
> +    }
> +
> +    wqe = (struct RmSqWqe *)ring_next_elem_read(&qp->sq);
> +    while (wqe) {
> +        CompHandlerCtx *comp_ctx;
> +
> +        pr_dbg("wr_id=%lld\n", wqe->hdr.wr_id);
> +        wqe->hdr.num_sge = MIN(wqe->hdr.num_sge,
> +                       qp->init_args.max_send_sge);
> +
> +        /* Prepare CQE */
> +        comp_ctx = malloc(sizeof(CompHandlerCtx));
> +        comp_ctx->dev = dev;
> +        comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
> +        comp_ctx->cqe.qp = qp_handle;
> +        comp_ctx->cq_handle = qp->init_args.send_cq_handle;
> +        comp_ctx->cqe.opcode = wqe->hdr.opcode;
> +        /* TODO: Fill rest of the data */
> +
> +        kdbr_send_wqe(dev->ports[qp->port_num].kdbr_port,
> +                      qp->kdbr_connection_id,
> +                      qp->init_args.qp_type == PVRDMA_QPT_RC, wqe, comp_ctx);
> +
> +        ring_read_inc(&qp->sq);
> +
> +        wqe = ring_next_elem_read(&qp->sq);
> +    }
> +
> +    return 0;
> +}
> +
> +int qp_recv(PVRDMADev *dev, __u32 qp_handle)
> +{
> +    RmQP *qp;
> +    RmRqWqe *wqe;
> +
> +    qp = rm_get_qp(dev, qp_handle);
> +    if (!qp) {
> +        return -EINVAL;
> +    }
> +
> +    if (qp->qp_state < PVRDMA_QPS_RTR) {
> +        pr_dbg("Invalid QP state for receive\n");
> +        return -EINVAL;
> +    }
> +
> +    wqe = (struct RmRqWqe *)ring_next_elem_read(&qp->rq);
> +    while (wqe) {
> +        CompHandlerCtx *comp_ctx;
> +
> +        pr_dbg("wr_id=%lld\n", wqe->hdr.wr_id);
> +        wqe->hdr.num_sge = MIN(wqe->hdr.num_sge,
> +                       qp->init_args.max_send_sge);
> +
> +        /* Prepare CQE */
> +        comp_ctx = malloc(sizeof(CompHandlerCtx));
> +        comp_ctx->dev = dev;
> +        comp_ctx->cqe.qp = qp_handle;
> +        comp_ctx->cq_handle = qp->init_args.recv_cq_handle;
> +        comp_ctx->cqe.wr_id = wqe->hdr.wr_id;
> +        comp_ctx->cqe.qp = qp_handle;
> +        /* TODO: Fill rest of the data */
> +
> +        kdbr_recv_wqe(dev->ports[qp->port_num].kdbr_port,
> +                      qp->kdbr_connection_id, wqe, comp_ctx);
> +
> +        ring_read_inc(&qp->rq);
> +
> +        wqe = ring_next_elem_read(&qp->rq);
> +    }
> +
> +    return 0;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_qp_ops.h b/hw/net/pvrdma/pvrdma_qp_ops.h
> new file mode 100644
> index 0000000..20125d6
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_qp_ops.h
> @@ -0,0 +1,25 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA QP Operations
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_QP_H
> +#define PVRDMA_QP_H
> +
> +typedef struct PVRDMADev PVRDMADev;
> +
> +int qp_ops_init(void);
> +void qp_ops_fini(void);
> +int qp_send(PVRDMADev *dev, __u32 qp_handle);
> +int qp_recv(PVRDMADev *dev, __u32 qp_handle);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_ring.c b/hw/net/pvrdma/pvrdma_ring.c
> new file mode 100644
> index 0000000..34dc1f5
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ring.c
> @@ -0,0 +1,127 @@
> +#include <qemu/osdep.h>
> +#include <hw/pci/pci.h>
> +#include <cpu.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +
> +int ring_init(Ring *ring, const char *name, PCIDevice *dev,
> +              struct pvrdma_ring *ring_state, size_t max_elems, size_t elem_sz,
> +              dma_addr_t *tbl, dma_addr_t npages)
> +{
> +    int i;
> +    int rc = 0;
> +
> +    strncpy(ring->name, name, MAX_RING_NAME_SZ);
> +    ring->name[MAX_RING_NAME_SZ - 1] = 0;
> +    pr_info("Initializing %s ring\n", ring->name);
> +    ring->dev = dev;
> +    ring->ring_state = ring_state;
> +    ring->max_elems = max_elems;
> +    ring->elem_sz = elem_sz;
> +    pr_dbg("ring->elem_sz=%ld\n", ring->elem_sz);
> +    pr_dbg("npages=%ld\n", npages);
> +    /* TODO: Give a moment to think if we want to redo driver settings
> +    atomic_set(&ring->ring_state->prod_tail, 0);
> +    atomic_set(&ring->ring_state->cons_head, 0);
> +    */
> +    ring->npages = npages;
> +    ring->pages = malloc(npages * sizeof(void *));
> +    for (i = 0; i < npages; i++) {
> +        if (!tbl[i]) {
> +            pr_err("npages=%ld but tbl[%d] is NULL\n", npages, i);
> +            continue;
> +        }
> +
> +        ring->pages[i] = pvrdma_pci_dma_map(dev, tbl[i], TARGET_PAGE_SIZE);
> +        if (!ring->pages[i]) {
> +            rc = -ENOMEM;
> +            pr_err("Fail to map to page %d\n", i);
> +            goto out_free;
> +        }
> +    }
> +
> +    goto out;
> +
> +out_free:
> +    while (i--) {
> +        pvrdma_pci_dma_unmap(dev, ring->pages[i], TARGET_PAGE_SIZE);
> +    }
> +    free(ring->pages);
> +
> +out:
> +    return rc;
> +}
> +
> +void *ring_next_elem_read(Ring *ring)
> +{
> +    unsigned int idx = 0, offset;
> +
> +    /*
> +    pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
> +           ring->ring_state->cons_head);
> +    */
> +
> +    if (!pvrdma_idx_ring_has_data(ring->ring_state, ring->max_elems, &idx)) {
> +        pr_dbg("No more data in ring\n");
> +        return NULL;
> +    }
> +
> +    offset = idx * ring->elem_sz;
> +    /*
> +    pr_dbg("idx=%d\n", idx);
> +    pr_dbg("offset=%d\n", offset);
> +    */
> +    return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % TARGET_PAGE_SIZE);
> +}
> +
> +void ring_read_inc(Ring *ring)
> +{
> +    pvrdma_idx_ring_inc(&ring->ring_state->cons_head, ring->max_elems);
> +    /*
> +    pr_dbg("%s: t=%d, h=%d, m=%ld\n", ring->name,
> +           ring->ring_state->prod_tail, ring->ring_state->cons_head,
> +           ring->max_elems);
> +    */
> +}
> +
> +void *ring_next_elem_write(Ring *ring)
> +{
> +    unsigned int idx, offset, tail;
> +
> +    /*
> +    pr_dbg("%s: t=%d, h=%d\n", ring->name, ring->ring_state->prod_tail,
> +           ring->ring_state->cons_head);
> +    */
> +
> +    if (!pvrdma_idx_ring_has_space(ring->ring_state, ring->max_elems, &tail)) {
> +        pr_dbg("CQ is full\n");
> +        return NULL;
> +    }
> +
> +    idx = pvrdma_idx(&ring->ring_state->prod_tail, ring->max_elems);
> +    /* TODO: tail == idx */
> +
> +    offset = idx * ring->elem_sz;
> +    return ring->pages[offset / TARGET_PAGE_SIZE] + (offset % TARGET_PAGE_SIZE);
> +}
> +
> +void ring_write_inc(Ring *ring)
> +{
> +    pvrdma_idx_ring_inc(&ring->ring_state->prod_tail, ring->max_elems);
> +    /*
> +    pr_dbg("%s: t=%d, h=%d, m=%ld\n", ring->name,
> +           ring->ring_state->prod_tail, ring->ring_state->cons_head,
> +           ring->max_elems);
> +    */
> +}
> +
> +void ring_free(Ring *ring)
> +{
> +    while (ring->npages--) {
> +        pvrdma_pci_dma_unmap(ring->dev, ring->pages[ring->npages],
> +                             TARGET_PAGE_SIZE);
> +    }
> +
> +    free(ring->pages);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_ring.h b/hw/net/pvrdma/pvrdma_ring.h
> new file mode 100644
> index 0000000..8a0c448
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_ring.h
> @@ -0,0 +1,43 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_RING_H
> +#define PVRDMA_RING_H
> +
> +#include <qemu/typedefs.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_types.h>
> +
> +#define MAX_RING_NAME_SZ 16
> +
> +typedef struct Ring {
> +    char name[MAX_RING_NAME_SZ];
> +    PCIDevice *dev;
> +    size_t max_elems;
> +    size_t elem_sz;
> +    struct pvrdma_ring *ring_state;
> +    int npages;
> +    void **pages;
> +} Ring;
> +
> +int ring_init(Ring *ring, const char *name, PCIDevice *dev,
> +              struct pvrdma_ring *ring_state, size_t max_elems, size_t elem_sz,
> +              dma_addr_t *tbl, dma_addr_t npages);
> +void *ring_next_elem_read(Ring *ring);
> +void ring_read_inc(Ring *ring);
> +void *ring_next_elem_write(Ring *ring);
> +void ring_write_inc(Ring *ring);
> +void ring_free(Ring *ring);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_rm.c b/hw/net/pvrdma/pvrdma_rm.c
> new file mode 100644
> index 0000000..55ca1e5
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_rm.c
> @@ -0,0 +1,529 @@
> +#include <hw/net/pvrdma/pvrdma.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/pvrdma_rm.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_kdbr.h>
> +#include <qemu/bitmap.h>
> +#include <qemu/atomic.h>
> +#include <cpu.h>
> +
> +/* Page directory and page tables */
> +#define PG_DIR_SZ { TARGET_PAGE_SIZE / sizeof(__u64) }
> +#define PG_TBL_SZ { TARGET_PAGE_SIZE / sizeof(__u64) }
> +
> +/* Global local and remote keys */
> +__u64 global_lkey = 1;
> +__u64 global_rkey = 1;
> +
> +static inline int res_tbl_init(const char *name, RmResTbl *tbl, u32 tbl_sz,
> +                               u32 res_sz)
> +{
> +    tbl->tbl = malloc(tbl_sz * res_sz);
> +    if (!tbl->tbl) {
> +        return -ENOMEM;
> +    }
> +
> +    strncpy(tbl->name, name, MAX_RING_NAME_SZ);
> +    tbl->name[MAX_RING_NAME_SZ - 1] = 0;
> +
> +    tbl->bitmap = bitmap_new(tbl_sz);
> +    tbl->tbl_sz = tbl_sz;
> +    tbl->res_sz = res_sz;
> +    qemu_mutex_init(&tbl->lock);
> +
> +    return 0;
> +}
> +
> +static inline void res_tbl_free(RmResTbl *tbl)
> +{
> +    qemu_mutex_destroy(&tbl->lock);
> +    free(tbl->tbl);
> +    bitmap_zero_extend(tbl->bitmap, tbl->tbl_sz, 0);
> +}
> +
> +static inline void *res_tbl_get(RmResTbl *tbl, u32 handle)
> +{
> +    pr_dbg("%s, handle=%d\n", tbl->name, handle);
> +
> +    if ((handle < tbl->tbl_sz) && (test_bit(handle, tbl->bitmap))) {
> +        return tbl->tbl + handle * tbl->res_sz;
> +    } else {
> +        pr_dbg("Invalid handle %d\n", handle);
> +        return NULL;
> +    }
> +}
> +
> +static inline void *res_tbl_alloc(RmResTbl *tbl, u32 *handle)
> +{
> +    qemu_mutex_lock(&tbl->lock);
> +
> +    *handle = find_first_zero_bit(tbl->bitmap, tbl->tbl_sz);
> +    if (*handle > tbl->tbl_sz) {
> +        pr_dbg("Fail to alloc, bitmap is full\n");
> +        qemu_mutex_unlock(&tbl->lock);
> +        return NULL;
> +    }
> +
> +    set_bit(*handle, tbl->bitmap);
> +
> +    qemu_mutex_unlock(&tbl->lock);
> +
> +    pr_dbg("%s, handle=%d\n", tbl->name, *handle);
> +
> +    return tbl->tbl + *handle * tbl->res_sz;
> +}
> +
> +static inline void res_tbl_dealloc(RmResTbl *tbl, u32 handle)
> +{
> +    pr_dbg("%s, handle=%d\n", tbl->name, handle);
> +
> +    qemu_mutex_lock(&tbl->lock);
> +
> +    if (handle < tbl->tbl_sz) {
> +        clear_bit(handle, tbl->bitmap);
> +    }
> +
> +    qemu_mutex_unlock(&tbl->lock);
> +}
> +
> +int rm_alloc_pd(PVRDMADev *dev, __u32 *pd_handle, __u32 ctx_handle)
> +{
> +    RmPD *pd;
> +
> +    pd = res_tbl_alloc(&dev->pd_tbl, pd_handle);
> +    if (!pd) {
> +        return -ENOMEM;
> +    }
> +
> +    pd->ctx_handle = ctx_handle;
> +
> +    return 0;
> +}
> +
> +void rm_dealloc_pd(PVRDMADev *dev, __u32 pd_handle)
> +{
> +    res_tbl_dealloc(&dev->pd_tbl, pd_handle);
> +}
> +
> +RmCQ *rm_get_cq(PVRDMADev *dev, __u32 cq_handle)
> +{
> +    return res_tbl_get(&dev->cq_tbl, cq_handle);
> +}
> +
> +int rm_alloc_cq(PVRDMADev *dev, struct pvrdma_cmd_create_cq *cmd,
> +                struct pvrdma_cmd_create_cq_resp *resp)
> +{
> +    int rc = 0;
> +    RmCQ *cq;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    __u64 *dir = 0, *tbl = 0;
> +    char ring_name[MAX_RING_NAME_SZ];
> +    u32 cqe;
> +
> +    cq = res_tbl_alloc(&dev->cq_tbl, &resp->cq_handle);
> +    if (!cq) {
> +        return -ENOMEM;
> +    }
> +
> +    memset(cq, 0, sizeof(RmCQ));
> +
> +    memcpy(&cq->init_args, cmd, sizeof(*cmd));
> +    cq->comp_type = CCT_NONE;
> +
> +    /* Get pointer to CQ */
> +    dir = pvrdma_pci_dma_map(pci_dev, cq->init_args.pdir_dma, TARGET_PAGE_SIZE);
> +    if (!dir) {
> +        pr_err("Fail to map to CQ page directory\n");
> +        rc = -ENOMEM;
> +        goto out_free_cq;
> +    }
> +    tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> +    if (!tbl) {
> +        pr_err("Fail to map to CQ page table\n");
> +        rc = -ENOMEM;
> +        goto out_free_cq;
> +    }
> +
> +    cq->ring_state = (struct pvrdma_ring *)
> +            pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> +    if (!cq->ring_state) {
> +        pr_err("Fail to map to CQ header page\n");
> +        rc = -ENOMEM;
> +        goto out_free_cq;
> +    }
> +
> +    sprintf(ring_name, "cq%d", resp->cq_handle);
> +    cqe = MIN(cmd->cqe, dev->dsr_info.dsr->caps.max_cqe);
> +    rc = ring_init(&cq->cq, ring_name, pci_dev, &cq->ring_state[1],
> +                   cqe, sizeof(struct pvrdma_cqe), (dma_addr_t *)&tbl[1],
> +                   cmd->nchunks - 1 /* first page is ring state */);
> +    if (rc != 0) {
> +        pr_err("Fail to initialize CQ ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_ring_state;
> +    }
> +
> +
> +    resp->cqe = cmd->cqe;
> +
> +    goto out;
> +
> +out_free_ring_state:
> +    pvrdma_pci_dma_unmap(pci_dev, cq->ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_cq:
> +    rm_dealloc_cq(dev, resp->cq_handle);
> +
> +out:
> +    if (tbl) {
> +        pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> +    }
> +    if (dir) {
> +        pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> +    }
> +
> +    return rc;
> +}
> +
> +void rm_req_notify_cq(PVRDMADev *dev, __u32 cq_handle, u32 flags)
> +{
> +    RmCQ *cq;
> +
> +    pr_dbg("cq_handle=%d, flags=0x%x\n", cq_handle, flags);
> +
> +    cq = rm_get_cq(dev, cq_handle);
> +    if (!cq) {
> +        return;
> +    }
> +
> +    cq->comp_type = (flags & PVRDMA_UAR_CQ_ARM_SOL) ? CCT_SOLICITED :
> +                     CCT_NEXT_COMP;
> +    pr_dbg("comp_type=%d\n", cq->comp_type);
> +}
> +
> +void rm_dealloc_cq(PVRDMADev *dev, __u32 cq_handle)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    RmCQ *cq;
> +
> +    cq = rm_get_cq(dev, cq_handle);
> +    if (!cq) {
> +        return;
> +    }
> +
> +    ring_free(&cq->cq);
> +    pvrdma_pci_dma_unmap(pci_dev, cq->ring_state, TARGET_PAGE_SIZE);
> +    res_tbl_dealloc(&dev->cq_tbl, cq_handle);
> +}
> +
> +int rm_alloc_mr(PVRDMADev *dev, struct pvrdma_cmd_create_mr *cmd,
> +                struct pvrdma_cmd_create_mr_resp *resp)
> +{
> +    RmMR *mr;
> +
> +    mr = res_tbl_alloc(&dev->mr_tbl, &resp->mr_handle);
> +    if (!mr) {
> +        return -ENOMEM;
> +    }
> +
> +    mr->pd_handle = cmd->pd_handle;
> +    resp->lkey = mr->lkey = global_lkey++;
> +    resp->rkey = mr->rkey = global_rkey++;
> +
> +    return 0;
> +}
> +
> +void rm_dealloc_mr(PVRDMADev *dev, __u32 mr_handle)
> +{
> +    res_tbl_dealloc(&dev->mr_tbl, mr_handle);
> +}
> +
> +int rm_alloc_qp(PVRDMADev *dev, struct pvrdma_cmd_create_qp *cmd,
> +                struct pvrdma_cmd_create_qp_resp *resp)
> +{
> +    int rc = 0;
> +    RmQP *qp;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    __u64 *dir = 0, *tbl = 0;
> +    int wqe_size;
> +    char ring_name[MAX_RING_NAME_SZ];
> +
> +    if (!rm_get_cq(dev, cmd->send_cq_handle) ||
> +        !rm_get_cq(dev, cmd->recv_cq_handle)) {
> +        pr_err("Invalid send_cqn or recv_cqn (%d, %d)\n",
> +               cmd->send_cq_handle, cmd->recv_cq_handle);
> +        return -EINVAL;
> +    }
> +
> +    qp = res_tbl_alloc(&dev->qp_tbl, &resp->qpn);
> +    if (!qp) {
> +        return -EINVAL;
> +    }
> +
> +    memset(qp, 0, sizeof(RmQP));
> +
> +    memcpy(&qp->init_args, cmd, sizeof(*cmd));
> +
> +    pr_dbg("qp_type=%d\n", qp->init_args.qp_type);
> +    pr_dbg("send_cq_handle=%d\n", qp->init_args.send_cq_handle);
> +    pr_dbg("max_send_sge=%d\n", qp->init_args.max_send_sge);
> +    pr_dbg("recv_cq_handle=%d\n", qp->init_args.recv_cq_handle);
> +    pr_dbg("max_recv_sge=%d\n", qp->init_args.max_recv_sge);
> +    pr_dbg("total_chunks=%d\n", cmd->total_chunks);
> +    pr_dbg("send_chunks=%d\n", cmd->send_chunks);
> +    pr_dbg("recv_chunks=%d\n", cmd->total_chunks - cmd->send_chunks);
> +
> +    qp->qp_state = PVRDMA_QPS_ERR;
> +
> +    /* Get pointer to send & recv rings */
> +    dir = pvrdma_pci_dma_map(pci_dev, qp->init_args.pdir_dma, TARGET_PAGE_SIZE);
> +    if (!dir) {
> +        pr_err("Fail to map to QP page directory\n");
> +        rc = -ENOMEM;
> +        goto out_free_qp;
> +    }
> +    tbl = pvrdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
> +    if (!tbl) {
> +        pr_err("Fail to map to QP page table\n");
> +        rc = -ENOMEM;
> +        goto out_free_qp;
> +    }
> +
> +    /* Send ring */
> +    qp->sq_ring_state = (struct pvrdma_ring *)
> +            pvrdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
> +    if (!qp->sq_ring_state) {
> +        pr_err("Fail to map to QP header page\n");
> +        rc = -ENOMEM;
> +        goto out_free_qp;
> +    }
> +
> +    wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_sq_wqe_hdr) +
> +                                  sizeof(struct pvrdma_sge) *
> +                                  qp->init_args.max_send_sge);
> +    sprintf(ring_name, "qp%d_sq", resp->qpn);
> +    rc = ring_init(&qp->sq, ring_name, pci_dev, qp->sq_ring_state,
> +                   qp->init_args.max_send_wr, wqe_size,
> +                   (dma_addr_t *)&tbl[1], cmd->send_chunks);
> +    if (rc != 0) {
> +        pr_err("Fail to initialize SQ ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_ring_state;
> +    }
> +
> +    /* Recv ring */
> +    qp->rq_ring_state = &qp->sq_ring_state[1];
> +    wqe_size = roundup_pow_of_two(sizeof(struct pvrdma_rq_wqe_hdr) +
> +                                  sizeof(struct pvrdma_sge) *
> +                                  qp->init_args.max_recv_sge);
> +    pr_dbg("wqe_size=%d\n", wqe_size);
> +    pr_dbg("pvrdma_rq_wqe_hdr=%ld\n", sizeof(struct pvrdma_rq_wqe_hdr));
> +    pr_dbg("pvrdma_sge=%ld\n", sizeof(struct pvrdma_sge));
> +    pr_dbg("init_args.max_recv_sge=%d\n", qp->init_args.max_recv_sge);
> +    sprintf(ring_name, "qp%d_rq", resp->qpn);
> +    rc = ring_init(&qp->rq, ring_name, pci_dev, qp->rq_ring_state,
> +                   qp->init_args.max_recv_wr, wqe_size,
> +                   (dma_addr_t *)&tbl[2], cmd->total_chunks -
> +                   cmd->send_chunks - 1 /* first page is ring state */);
> +    if (rc != 0) {
> +        pr_err("Fail to initialize RQ ring\n");
> +        rc = -ENOMEM;
> +        goto out_free_send_ring;
> +    }
> +
> +    resp->max_send_wr = cmd->max_send_wr;
> +    resp->max_recv_wr = cmd->max_recv_wr;
> +    resp->max_send_sge = cmd->max_send_sge;
> +    resp->max_recv_sge = cmd->max_recv_sge;
> +    resp->max_inline_data = cmd->max_inline_data;
> +
> +    goto out;
> +
> +out_free_send_ring:
> +    ring_free(&qp->sq);
> +
> +out_free_ring_state:
> +    pvrdma_pci_dma_unmap(pci_dev, qp->sq_ring_state, TARGET_PAGE_SIZE);
> +
> +out_free_qp:
> +    rm_dealloc_qp(dev, resp->qpn);
> +
> +out:
> +    if (tbl) {
> +        pvrdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
> +    }
> +    if (dir) {
> +        pvrdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
> +    }
> +
> +    return rc;
> +}
> +
> +int rm_modify_qp(PVRDMADev *dev, __u32 qp_handle,
> +                 struct pvrdma_cmd_modify_qp *modify_qp_args)
> +{
> +    RmQP *qp;
> +
> +    pr_dbg("qp_handle=%d\n", qp_handle);
> +    pr_dbg("new_state=%d\n", modify_qp_args->attrs.qp_state);
> +
> +    qp = res_tbl_get(&dev->qp_tbl, qp_handle);
> +    if (!qp) {
> +        return -EINVAL;
> +    }
> +
> +    pr_dbg("qp_type=%d\n", qp->init_args.qp_type);
> +
> +    if (modify_qp_args->attr_mask & PVRDMA_QP_PORT) {
> +        qp->port_num = modify_qp_args->attrs.port_num - 1;
> +    }
> +    if (modify_qp_args->attr_mask & PVRDMA_QP_DEST_QPN) {
> +        qp->dest_qp_num = modify_qp_args->attrs.dest_qp_num;
> +    }
> +    if (modify_qp_args->attr_mask & PVRDMA_QP_AV) {
> +        qp->dgid = modify_qp_args->attrs.ah_attr.grh.dgid;
> +        qp->port_num = modify_qp_args->attrs.ah_attr.port_num - 1;
> +    }
> +    if (modify_qp_args->attr_mask & PVRDMA_QP_STATE) {
> +        qp->qp_state = modify_qp_args->attrs.qp_state;
> +    }
> +
> +    /* kdbr connection */
> +    if (qp->qp_state == PVRDMA_QPS_RTR) {
> +        qp->kdbr_connection_id =
> +            kdbr_open_connection(dev->ports[qp->port_num].kdbr_port,
> +                                 qp_handle, qp->dgid, qp->dest_qp_num,
> +                                 qp->init_args.qp_type == PVRDMA_QPT_RC);
> +        if (qp->kdbr_connection_id == 0) {
> +            return -EIO;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void rm_dealloc_qp(PVRDMADev *dev, __u32 qp_handle)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    RmQP *qp;
> +
> +    qp = res_tbl_get(&dev->qp_tbl, qp_handle);
> +    if (!qp) {
> +        return;
> +    }
> +
> +    if (qp->kdbr_connection_id) {
> +        kdbr_close_connection(dev->ports[qp->port_num].kdbr_port,
> +                              qp->kdbr_connection_id);
> +    }
> +
> +    ring_free(&qp->rq);
> +    ring_free(&qp->sq);
> +
> +    pvrdma_pci_dma_unmap(pci_dev, qp->sq_ring_state, TARGET_PAGE_SIZE);
> +
> +    res_tbl_dealloc(&dev->qp_tbl, qp_handle);
> +}
> +
> +RmQP *rm_get_qp(PVRDMADev *dev, __u32 qp_handle)
> +{
> +    return res_tbl_get(&dev->qp_tbl, qp_handle);
> +}
> +
> +void *rm_get_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id)
> +{
> +    void **wqe_ctx;
> +
> +    wqe_ctx = res_tbl_get(&dev->wqe_ctx_tbl, wqe_ctx_id);
> +    if (!wqe_ctx) {
> +        return NULL;
> +    }
> +
> +    pr_dbg("ctx=%p\n", *wqe_ctx);
> +
> +    return *wqe_ctx;
> +}
> +
> +int rm_alloc_wqe_ctx(PVRDMADev *dev, unsigned long *wqe_ctx_id, void *ctx)
> +{
> +    void **wqe_ctx;
> +
> +    wqe_ctx = res_tbl_alloc(&dev->wqe_ctx_tbl, (u32 *)wqe_ctx_id);
> +    if (!wqe_ctx) {
> +        return -ENOMEM;
> +    }
> +
> +    pr_dbg("ctx=%p\n", ctx);
> +    *wqe_ctx = ctx;
> +
> +    return 0;
> +}
> +
> +void rm_dealloc_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id)
> +{
> +    res_tbl_dealloc(&dev->wqe_ctx_tbl, (u32) wqe_ctx_id);
> +}
> +
> +int rm_init(PVRDMADev *dev)
> +{
> +    int ret = 0;
> +
> +    ret = res_tbl_init("PD", &dev->pd_tbl, MAX_PDS, sizeof(RmPD));
> +    if (ret != 0) {
> +        goto cln_pds;
> +    }
> +
> +    ret = res_tbl_init("CQ", &dev->cq_tbl, MAX_CQS, sizeof(RmCQ));
> +    if (ret != 0) {
> +        goto cln_cqs;
> +    }
> +
> +    ret = res_tbl_init("MR", &dev->mr_tbl, MAX_MRS, sizeof(RmMR));
> +    if (ret != 0) {
> +        goto cln_mrs;
> +    }
> +
> +    ret = res_tbl_init("QP", &dev->qp_tbl, MAX_QPS, sizeof(RmQP));
> +    if (ret != 0) {
> +        goto cln_qps;
> +    }
> +
> +    ret = res_tbl_init("WQE_CTX", &dev->wqe_ctx_tbl, MAX_QPS * MAX_QP_WRS,
> +               sizeof(void *));
> +    if (ret != 0) {
> +        goto cln_wqe_ctxs;
> +    }
> +
> +    goto out;
> +
> +cln_wqe_ctxs:
> +    res_tbl_free(&dev->wqe_ctx_tbl);
> +
> +cln_qps:
> +    res_tbl_free(&dev->qp_tbl);
> +
> +cln_mrs:
> +    res_tbl_free(&dev->mr_tbl);
> +
> +cln_cqs:
> +    res_tbl_free(&dev->cq_tbl);
> +
> +cln_pds:
> +    res_tbl_free(&dev->pd_tbl);
> +
> +out:
> +    if (ret != 0) {
> +        pr_err("Fail to initialize RM\n");
> +    }
> +
> +    return ret;
> +}
> +
> +void rm_fini(PVRDMADev *dev)
> +{
> +    res_tbl_free(&dev->pd_tbl);
> +    res_tbl_free(&dev->cq_tbl);
> +    res_tbl_free(&dev->mr_tbl);
> +    res_tbl_free(&dev->qp_tbl);
> +    res_tbl_free(&dev->wqe_ctx_tbl);
> +}
> diff --git a/hw/net/pvrdma/pvrdma_rm.h b/hw/net/pvrdma/pvrdma_rm.h
> new file mode 100644
> index 0000000..1d42bc7
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_rm.h
> @@ -0,0 +1,214 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA - Resource Manager
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_RM_H
> +#define PVRDMA_RM_H
> +
> +#include <hw/net/pvrdma/pvrdma_dev_api.h>
> +#include <hw/net/pvrdma/pvrdma-uapi.h>
> +#include <hw/net/pvrdma/pvrdma_ring.h>
> +#include <hw/net/pvrdma/kdbr.h>
> +
> +/* TODO: More then 1 port it fails in ib_modify_qp, maybe something with
> + * the MAC of the second port */
> +#define MAX_PORTS        1 /* Driver force to 1 see pvrdma_add_gid */
> +#define MAX_PORT_GIDS    1
> +#define MAX_PORT_PKEYS   1
> +#define MAX_PKEYS        1
> +#define MAX_PDS          2048
> +#define MAX_CQS          2048
> +#define MAX_CQES         1024 /* cqe size is 64 */
> +#define MAX_QPS          1024
> +#define MAX_GIDS         2048
> +#define MAX_QP_WRS       1024 /* wqe size is 128 */
> +#define MAX_SGES         4
> +#define MAX_MRS          2048
> +#define MAX_AH           1024
> +
> +typedef struct PVRDMADev PVRDMADev;
> +typedef struct KdbrPort KdbrPort;
> +
> +#define MAX_RMRESTBL_NAME_SZ 16
> +typedef struct RmResTbl {
> +    char name[MAX_RMRESTBL_NAME_SZ];
> +    unsigned long *bitmap;
> +    size_t tbl_sz;
> +    size_t res_sz;
> +    void *tbl;
> +    QemuMutex lock;
> +} RmResTbl;
> +
> +enum cq_comp_type {
> +    CCT_NONE,
> +    CCT_SOLICITED,
> +    CCT_NEXT_COMP,
> +};
> +
> +typedef struct RmPD {
> +    __u32 ctx_handle;
> +} RmPD;
> +
> +typedef struct RmCQ {
> +    struct pvrdma_cmd_create_cq init_args;
> +    struct pvrdma_ring *ring_state;
> +    Ring cq;
> +    enum cq_comp_type comp_type;
> +} RmCQ;
> +
> +/* MR (DMA region) */
> +typedef struct RmMR {
> +    __u32 pd_handle;
> +    __u32 lkey;
> +    __u32 rkey;
> +} RmMR;
> +
> +typedef struct RmSqWqe {
> +    struct pvrdma_sq_wqe_hdr hdr;
> +    struct pvrdma_sge sge[0];
> +} RmSqWqe;
> +
> +typedef struct RmRqWqe {
> +    struct pvrdma_rq_wqe_hdr hdr;
> +    struct pvrdma_sge sge[0];
> +} RmRqWqe;
> +
> +typedef struct RmQP {
> +    struct pvrdma_cmd_create_qp init_args;
> +    enum pvrdma_qp_state qp_state;
> +    u8 port_num;
> +    u32 dest_qp_num;
> +    union pvrdma_gid dgid;
> +
> +    struct pvrdma_ring *sq_ring_state;
> +    Ring sq;
> +    struct pvrdma_ring *rq_ring_state;
> +    Ring rq;
> +
> +    unsigned long kdbr_connection_id;
> +} RmQP;
> +
> +typedef struct RmPort {
> +    enum pvrdma_port_state state;
> +    union pvrdma_gid gid_tbl[MAX_PORT_GIDS];
> +    /* TODO: Change type */
> +    int *pkey_tbl;
> +    KdbrPort *kdbr_port;
> +} RmPort;
> +
> +static inline int rm_get_max_port_gids(__u32 *max_port_gids)
> +{
> +    *max_port_gids = MAX_PORT_GIDS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_port_pkeys(__u32 *max_port_pkeys)
> +{
> +    *max_port_pkeys = MAX_PORT_PKEYS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_pkeys(__u16 *max_pkeys)
> +{
> +    *max_pkeys = MAX_PKEYS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_cqs(__u32 *max_cqs)
> +{
> +    *max_cqs = MAX_CQS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_cqes(__u32 *max_cqes)
> +{
> +    *max_cqes = MAX_CQES;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_pds(__u32 *max_pds)
> +{
> +    *max_pds = MAX_PDS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_qps(__u32 *max_qps)
> +{
> +    *max_qps = MAX_QPS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_gids(__u32 *max_gids)
> +{
> +    *max_gids = MAX_GIDS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_qp_wrs(__u32 *max_qp_wrs)
> +{
> +    *max_qp_wrs = MAX_QP_WRS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_sges(__u32 *max_sges)
> +{
> +    *max_sges = MAX_SGES;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_mrs(__u32 *max_mrs)
> +{
> +    *max_mrs = MAX_MRS;
> +    return 0;
> +}
> +
> +static inline int rm_get_phys_port_cnt(__u8 *phys_port_cnt)
> +{
> +    *phys_port_cnt = MAX_PORTS;
> +    return 0;
> +}
> +
> +static inline int rm_get_max_ah(__u32 *max_ah)
> +{
> +    *max_ah = MAX_AH;
> +    return 0;
> +}
> +
> +int rm_init(PVRDMADev *dev);
> +void rm_fini(PVRDMADev *dev);
> +
> +int rm_alloc_pd(PVRDMADev *dev, __u32 *pd_handle, __u32 ctx_handle);
> +void rm_dealloc_pd(PVRDMADev *dev, __u32 pd_handle);
> +
> +RmCQ *rm_get_cq(PVRDMADev *dev, __u32 cq_handle);
> +int rm_alloc_cq(PVRDMADev *dev, struct pvrdma_cmd_create_cq *cmd,
> +        struct pvrdma_cmd_create_cq_resp *resp);
> +void rm_req_notify_cq(PVRDMADev *dev, __u32 cq_handle, u32 flags);
> +void rm_dealloc_cq(PVRDMADev *dev, __u32 cq_handle);
> +
> +int rm_alloc_mr(PVRDMADev *dev, struct pvrdma_cmd_create_mr *cmd,
> +        struct pvrdma_cmd_create_mr_resp *resp);
> +void rm_dealloc_mr(PVRDMADev *dev, __u32 mr_handle);
> +
> +RmQP *rm_get_qp(PVRDMADev *dev, __u32 qp_handle);
> +int rm_alloc_qp(PVRDMADev *dev, struct pvrdma_cmd_create_qp *cmd,
> +        struct pvrdma_cmd_create_qp_resp *resp);
> +int rm_modify_qp(PVRDMADev *dev, __u32 qp_handle,
> +         struct pvrdma_cmd_modify_qp *modify_qp_args);
> +void rm_dealloc_qp(PVRDMADev *dev, __u32 qp_handle);
> +
> +void *rm_get_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id);
> +int rm_alloc_wqe_ctx(PVRDMADev *dev, unsigned long *wqe_ctx_id, void *ctx);
> +void rm_dealloc_wqe_ctx(PVRDMADev *dev, unsigned long wqe_ctx_id);
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_types.h b/hw/net/pvrdma/pvrdma_types.h
> new file mode 100644
> index 0000000..22a7cde
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_types.h
> @@ -0,0 +1,37 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_TYPES_H
> +#define PVRDMA_TYPES_H
> +
> +/* TDOD: All defs here should be removed !!! */
> +
> +#include <stdint.h>
> +#include <asm-generic/int-ll64.h>
> +
> +typedef unsigned char uint8_t;
> +typedef uint64_t dma_addr_t;
> +
> +typedef uint8_t        __u8;
> +typedef uint8_t        u8;
> +typedef unsigned short __u16;
> +typedef unsigned short u16;
> +typedef uint64_t       u64;
> +typedef uint32_t       u32;
> +typedef uint32_t       __u32;
> +typedef int32_t       __s32;
> +#define __bitwise
> +typedef __u64 __bitwise __be64;
> +
> +#endif
> diff --git a/hw/net/pvrdma/pvrdma_utils.c b/hw/net/pvrdma/pvrdma_utils.c
> new file mode 100644
> index 0000000..0f420e2
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_utils.c
> @@ -0,0 +1,36 @@
> +#include <qemu/osdep.h>
> +#include <cpu.h>
> +#include <hw/pci/pci.h>
> +#include <hw/net/pvrdma/pvrdma_utils.h>
> +#include <hw/net/pvrdma/pvrdma.h>
> +
> +void pvrdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len)
> +{
> +    pr_dbg("%p\n", buffer);
> +    pci_dma_unmap(dev, buffer, len, DMA_DIRECTION_TO_DEVICE, 0);
> +}
> +
> +void *pvrdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen)
> +{
> +    void *p;
> +    hwaddr len = plen;
> +
> +    if (!addr) {
> +        pr_dbg("addr is NULL\n");
> +        return NULL;
> +    }
> +
> +    p = pci_dma_map(dev, addr, &len, DMA_DIRECTION_TO_DEVICE);
> +    if (!p) {
> +        return NULL;
> +    }
> +
> +    if (len != plen) {
> +        pvrdma_pci_dma_unmap(dev, p, len);
> +        return NULL;
> +    }
> +
> +    pr_dbg("0x%llx -> %p (len=%ld)\n", (long long unsigned int)addr, p, len);
> +
> +    return p;
> +}
> diff --git a/hw/net/pvrdma/pvrdma_utils.h b/hw/net/pvrdma/pvrdma_utils.h
> new file mode 100644
> index 0000000..da01967
> --- /dev/null
> +++ b/hw/net/pvrdma/pvrdma_utils.h
> @@ -0,0 +1,49 @@
> +/*
> + * QEMU VMWARE paravirtual RDMA interface definitions
> + *
> + * Developed by Oracle & Redhat
> + *
> + * Authors:
> + *     Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> + *     Marcel Apfelbaum <marcel@xxxxxxxxxx>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef PVRDMA_UTILS_H
> +#define PVRDMA_UTILS_H
> +
> +#define pr_info(fmt, ...) \
> +    fprintf(stdout, "%s: %-20s (%3d): " fmt, "pvrdma",  __func__, __LINE__,\
> +           ## __VA_ARGS__)
> +
> +#define pr_err(fmt, ...) \
> +    fprintf(stderr, "%s: Error at %-20s (%3d): " fmt, "pvrdma", __func__, \
> +        __LINE__, ## __VA_ARGS__)
> +
> +#define DEBUG
> +#ifdef DEBUG
> +#define pr_dbg(fmt, ...) \
> +    fprintf(stdout, "%s: %-20s (%3d): " fmt, "pvrdma", __func__, __LINE__,\
> +           ## __VA_ARGS__)
> +#else
> +#define pr_dbg(fmt, ...)
> +#endif
> +
> +static inline int roundup_pow_of_two(int x)
> +{
> +    x--;
> +    x |= (x >> 1);
> +    x |= (x >> 2);
> +    x |= (x >> 4);
> +    x |= (x >> 8);
> +    x |= (x >> 16);
> +    return x + 1;
> +}
> +
> +void pvrdma_pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len);
> +void *pvrdma_pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t plen);
> +
> +#endif
> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> index d77ca60..a016ad6 100644
> --- a/include/hw/pci/pci_ids.h
> +++ b/include/hw/pci/pci_ids.h
> @@ -167,4 +167,7 @@
>  #define PCI_VENDOR_ID_TEWS               0x1498
>  #define PCI_DEVICE_ID_TEWS_TPCI200       0x30C8
>
> +#define PCI_VENDOR_ID_VMWARE             0x15ad
> +#define PCI_DEVICE_ID_VMWARE_PVRDMA      0x0820
> +
>  #endif
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux