On Sun, Dec 1, 2024 at 7:51 PM D. Wythe <alibuda@xxxxxxxxxxxxxxxxx> wrote: > > +int homa_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, > > + unsigned int optlen) > > +{ > > + struct homa_sock *hsk = homa_sk(sk); > > + struct homa_set_buf_args args; > > + int ret; > > + > > + if (level != IPPROTO_HOMA || optname != SO_HOMA_SET_BUF || > > + optlen != sizeof(struct homa_set_buf_args)) > > + return -EINVAL; > > SO_HOMA_SET_BUF is a bit odd here, maybe HOMA_RCVBUF ? which also can be > implemented in getsockopt. I have changed it to HOMA_RCVBUF (and renamed struct homa_set_buf_args to struct homa_rcvbuf_args). I also implemented getsockopt for HOMA_RCVBUF. > > + > > + if (copy_from_sockptr(&args, optval, optlen)) > > + return -EFAULT; > > + > > + /* Do a trivial test to make sure we can at least write the first > > + * page of the region. > > + */ > > + if (copy_to_user((__force void __user *)args.start, &args, sizeof(args))) > > + return -EFAULT; > > To share buffer between kernel and userspace, maybe you should refer to the implementation of > io_pin_pbuf_ring() I'm not sure what you mean here. Are you suggesting that I look at the code of io_pin_pbuf_ring to make sure I've done everything needed to share buffers? I don't believe that Homa needs to do anything special (e.g. it doesn't need to pin the user's buffers); it just saves the address and makes copy_to_user calls later when needed (and these calls are all done at syscall level in the context of the application). Is there something I'm missing? > > + > > + homa_sock_lock(hsk, "homa_setsockopt SO_HOMA_SET_BUF"); > > + ret = homa_pool_init(hsk, (__force void __user *)args.start, args.length); > > + homa_sock_unlock(hsk); > > It seems that if the option was not set, poll->hsk will not be initialized > and then if recvmsg is called, a panic will be triggered. I fixed homa_pool_check_waiting to properly handle empty pools. > > +/** > > + * homa_sendmsg() - Send a request or response message on a Homa socket. > > + * @sk: Socket on which the system call was invoked. > > + * @msg: Structure describing the message to send; the msg_control > > + * field points to additional information. > > + * @length: Number of bytes of the message. > > + * Return: 0 on success, otherwise a negative errno. > > + */ > > +int homa_sendmsg(struct sock *sk, struct msghdr *msg, size_t length) > > +{ > > + struct homa_sock *hsk = homa_sk(sk); > > + struct homa_sendmsg_args args; > > + int result = 0; > > + struct homa_rpc *rpc = NULL; > > + union sockaddr_in_union *addr = (union sockaddr_in_union *)msg->msg_name; > > msg->msg_name can be NULL. I have added a check for this. > > + if (addr->in6.sin6_family != sk->sk_family) { > > + result = -EAFNOSUPPORT; > > + goto error; > > + } > > addr->sa.sa_family? Using sin6_family would be odd to me, making it seem like homa can only run with > IPv6. The family is in the same place for all protocols so I thought it would be safe to reference it as in6.sin6_family. But you're right that it looks odd, so I changed it to addr->sa.sa_family. > > + rpc = homa_rpc_new_client(hsk, addr); > > + if (IS_ERR(rpc)) { > > + result = PTR_ERR(rpc); > > + rpc = NULL; > > + goto error; > > + } > > + rpc->completion_cookie = args.completion_cookie; > > + result = homa_message_out_fill(rpc, &msg->msg_iter, 1); > > + if (result) > > + goto error; > > + args.id = rpc->id; > > + homa_rpc_unlock(rpc); > > Strongly recommend that adding comments with all the unlock part to indicate where it was > locked from. I went through the code and added comments in places where the locking isn't relatively obvious (e.g. it happens through methods without "lock" in their name, such as homa_rpc_new_client). I also renamed homa_try_bucket_lock to homa_try_rpc_lock, which will clarify RPC locking a bit better. > > + canonical_dest = canonical_ipv6_addr(addr); > > Are you treating all addresses as IPv6 addresses just for the sake of simplicity? It's a bit odd, > but okay to me. Yes: Homa uses IPv6 addresses internally for both IPv4 and IPv6; this allows Homa to work with both IPv4 and IPv6 with only a small amount of code that is protocol-specific. > > +int homa_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, > > + int *addr_len) > > +{ > > + struct homa_sock *hsk = homa_sk(sk); > > + struct homa_recvmsg_args control; > > + struct homa_rpc *rpc; > > + int result; > > + > > + if (unlikely(!msg->msg_control)) { > > + /* This test isn't strictly necessary, but it provides a > > + * hook for testing kernel call times. > > + */ > > + return -EINVAL; > > + } > > + if (msg->msg_controllen != sizeof(control)) { > > + result = -EINVAL; > > + goto done; > > Then you copied an uninitialized control into userspace ... > Is goto really necessary ? Maybe just return. Good catch; I changed it to just return. In general, I prefer to funnel everything through a single cleanup-and-return point rather than returning directly in some cases and jumping to a cleanup point in others; I think that makes the code cleaner and more obvious. But, that doesn't really work here. > > > + } > > + if (unlikely(copy_from_user(&control, > > + (__force void __user *)msg->msg_control, > > + sizeof(control)))) { > > + result = -EFAULT; > > + goto done; > > Same as above, is goto really necessary ? I changed it to return directly. > > + } > > + control.completion_cookie = 0; > > + > > + if (control.num_bpages > HOMA_MAX_BPAGES || > > + (control.flags & ~HOMA_RECVMSG_VALID_FLAGS)) { > > + result = -EINVAL; > > + goto done; > > + } > > + homa_pool_release_buffers(hsk->buffer_pool, control.num_bpages, > > + control.bpage_offsets); > > homa_pool_release_buffers() quietly ignores erroneous bpage_index values passed in from the > userspace. This behavior may obscure more complex issues in userspace, and exposure this problem > could help users identify issues earlier. Good point; homa_recvmsg now returns -EINVAL if there are erroneous bpage_index values passed in. > > + control.num_bpages = 0; > > + > > + rpc = homa_wait_for_message(hsk, (flags & MSG_DONTWAIT) > > + ? (control.flags | HOMA_RECVMSG_NONBLOCKING) > > + : control.flags, control.id); > > + if (IS_ERR(rpc)) { > > + /* If we get here, it means there was an error that prevented > > + * us from finding an RPC to return. If there's an error in > > + * the RPC itself we won't get here. > > + */ > > + result = PTR_ERR(rpc); > > + goto done; > > + } > > + result = rpc->error ? rpc->error : rpc->msgin.length; > > A trivial tips. > > result = rpc->error ?: rpc->msgin.length; I wasn't aware that the shorter form existed. However, I find the longer form a bit more obvious, so I'd prefer to leave that unless you feel strongly that it should be abbreviated. > > + > > + /* Collect result information. */ > > + control.id = rpc->id; > > + control.completion_cookie = rpc->completion_cookie; > > + if (likely(rpc->msgin.length >= 0)) { > > + control.num_bpages = rpc->msgin.num_bpages; > > + memcpy(control.bpage_offsets, rpc->msgin.bpage_offsets, > > + sizeof(control.bpage_offsets)); > > A trivial tips. > > Although sizeof(control.bpage_offsets) and sizeof(rpc->msgin.bpage_offsets) are the same, but > passing sizeof(rpc->msgin.bpage_offsets) would be more appropriate. Done. Thanks for the comments; lots of good caches there. -John-