Hi, Le jeudi 29 janvier 2015 à 11:28 -0700, Jason Gunthorpe a écrit : > On Thu, Jan 29, 2015 at 06:59:58PM +0100, Yann Droneaud wrote: > > As specified in "Extending Verbs API" presentation [1] by Tzahi Oved > > during OFA International Developer Workshop 2013, the request's > > comp_mask should describe the request data: it's describe the > > availability of extended fields in the request. > > Conversely, the response's comp_mask should describe the presence > > of extended fields in the response. > > Roland: I agree with Yann, these patches need to go in, or the ODP > patches reverted. > Reverting all On Demand Paging patches seems overkill: if something as to be reverted it should be commit 5a77abf9a97a ("IB/core: Add support for extended query device caps") and the part of commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") which modify ib_uverbs_ex_query_device(). But I wonder about this part of commit 860f10a799c8: diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index c7a43624c96b..f9326ccda4b5 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -953,6 +953,18 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, goto err_free; } + if (cmd.access_flags & IB_ACCESS_ON_DEMAND) { + struct ib_device_attr attr; + + ret = ib_query_device(pd->device, &attr); + if (ret || !(attr.device_cap_flags & + IB_DEVICE_ON_DEMAND_PAGING)) { + pr_debug("ODP support not available\n"); + ret = -EINVAL; + goto err_put; + } + } + AFAICT (1 << 6) bit from struct ib_uverbs_reg_mr access_flags field was not enforced to be 0 previously, as ib_check_mr_access() only check for some coherency between a subset of the bits (it's not a function dedicated to check flags provided by userspace): include/rdma/ib_verbs.h: 1094 enum ib_access_flags { 1095 IB_ACCESS_LOCAL_WRITE = 1, 1096 IB_ACCESS_REMOTE_WRITE = (1<<1), 1097 IB_ACCESS_REMOTE_READ = (1<<2), 1098 IB_ACCESS_REMOTE_ATOMIC = (1<<3), 1099 IB_ACCESS_MW_BIND = (1<<4), 1100 IB_ZERO_BASED = (1<<5), 1101 IB_ACCESS_ON_DEMAND = (1<<6), 1102 }; drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr() 961 ret = ib_check_mr_access(cmd.access_flags); 962 if (ret) 963 return ret; include/rdma/ib_verbs.h: 2643 static inline int ib_check_mr_access(int flags) 2644 { 2645 /* 2646 * Local write permission is required if remote write or 2647 * remote atomic permission is also requested. 2648 */ 2649 if (flags & (IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_REMOTE_WRITE) && 2650 !(flags & IB_ACCESS_LOCAL_WRITE)) 2651 return -EINVAL; 2652 2653 return 0; 2654 } drivers/infiniband/core/uverbs_cmd.c: ib_uverbs_reg_mr() 990 mr = pd->device->reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va, 991 cmd.access_flags, &udata); reg_user_mr() functions may call ib_umem_get() and pass access_flags to it: drivers/infiniband/core/umem.c: ib_umem_get() 114 /* 115 * We ask for writable memory if any of the following 116 * access flags are set. "Local write" and "remote write" 117 * obviously require write access. "Remote atomic" can do 118 * things like fetch and add, which will modify memory, and 119 * "MW bind" can change permissions by binding a window. 120 */ 121 umem->writable = !!(access & 122 (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | 123 IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); 124 125 if (access & IB_ACCESS_ON_DEMAND) { 126 ret = ib_umem_odp_get(context, umem); 127 if (ret) { 128 kfree(umem); 129 return ERR_PTR(ret); 130 } 131 return umem; 132 } As you can see only a few bits in access_flags are checked in the end, so it may exist a very unlikely possibility that an existing userspace program is setting this IB_ACCESS_ON_DEMAND bit without the intention of enabling on demand paging as this would be unnoticed by older kernel. In the other hand, a newer program built with on-demand-paging in mind will set the bit, but when run on older kernel, it will be a no-op, allowing the program to continue, perhaps thinking on-demand-paging is available. That should be avoided as much as possible. Unfortunately, I think this cannot be fixed as it's was long since IB_ZERO_BASED was added by commit 7083e42ee2 ("IB/core: Add "type 2" memory windows support"). Anyway there was no check for IB_ACCESS_REMOTE_READ, nor IB_ACCESS_MW_BIND in the uverb layer either. So, just as the second argument of open() syscall (remember O_TMPFILE, see http://lwn.net/Articles/562294/ ), we will have to live with and be careful ... Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html