> How you create a QP owned by the kernel but linked to a PD owned by uverbs is going to need very delicate and careful work to be somehow compatible with our disassociation model. The primary usage mode is as follows: The RC QP, PD and MR are all in the kernel. The buffer virtual address and len is supplied by the user process and then used to lookup a MR in the cache, upon miss, a kernel MR is created against the kernel PD. There are separate MR caches per user process. The IOs are initiated by the user, matched to a MR in the cache, then a RDMA Write w/Immed is posted on the kernel RC QP. In concept, this is not unlike other kernel ULPs which perform direct data placement into user memory but use kernel QPs for connections to remote resources, such as various RDMA storage and filesystem ULPs. The separation of the MR registration call from the IO allows the registration cost of a miss to be partially hidden behind the end to end RTS/CTS exchange which is occurring in user space. There is a secondary usage mode where the MRs are cached, but created against a user PD and later used by the user process against QPs in the user. We found that usage mode offered some slight latency advantages over the primary mode for tiny jobs, but suffered significant scalability issues. Those latency advantages mainly manifested in microbenchmarks, but did help a few apps. If it would simplify things, we could focus the discussion on the primary usage mode. Conceptually, the secondary usage mode may be a good candidate for an extension to uverbs (some form of register MR w/caching API where register MR checks a cache and deregister MR merely decrements a reference count in the cache). Todd