On Mon, Nov 30, 2020 at 5:52 AM Stefan Metzmacher <metze@xxxxxxxxx> wrote: > > Am 28.11.20 um 20:03 schrieb Victor Stewart: > > On Thu, Nov 26, 2020 at 7:36 AM Stefan Metzmacher <metze@xxxxxxxxx> wrote: > >> > >> Am 23.11.20 um 17:29 schrieb Victor Stewart: > >>> On Mon, Nov 23, 2020 at 4:13 PM Stefan Metzmacher <metze@xxxxxxxxx> wrote: > >>>> > >>>> Hi Victor, > >>>> > >>>> wouldn't it be enough to port the PROTO_CMSG_DATA_ONLY check to the sendmsg path? > >>>> > >>>> UDP sockets should have PROTO_CMSG_DATA_ONLY set. > >>>> > >>>> I guess that would fix your current problem. > >>> > >>> that would definitely solve the problem and is the easiest solution. > >>> > >>> but PROTO_CMSG_DATA_ONLY is only set on inet_stream_ops and > >>> inet6_stream_ops but dgram? > >> > >> I guess PROTO_CMSG_DATA_ONLY should be added also for dgram sockets. > >> > >> Did you intend to remove the cc for the mailing list? > >> > >> I think in addition to the io-uring list, cc'ing netdev@xxxxxxxxxxxxxxx > >> would also be good. > > > > whoops forgot to reply all. > > > > before I CC netdev, what does PROTO_CMSG_DATA_ONLY actually mean? > > I don't really know, but I guess it means that, any supported CMSG type > on that socket won't do any magic depending on the process state, like > fd passing with SOL_SOCKET/SCM_RIGHTS or SCM_CREDENTIALS. The CMSG buffer > would just be a plain byte array, which may only reference state attached > to the specific socket or packet. > > I'd guess that the author and/or reviewers can clarify that, let's see what > they'll answer. > > > I didn't find a clear explanation anywhere by searching the kernel, only > > that it was defined as 1 and flagged on inet_stream_ops and > > inet6_stream_ops. > > > > there must be a reason it was not initially included for dgrams? > > I can't think of any difference I guess the author just tried to get add support for the specific usecase > that didn't work (MSG_ZEROCOPY in this case, most likely only tested with a tcp workload): > > commit 583bbf0624dfd8fc45f1049be1d4980be59451ff > Author: Luke Hsiao <lukehsiao@xxxxxxxxxx> > Date: Fri Aug 21 21:41:04 2020 -0700 > > io_uring: allow tcp ancillary data for __sys_recvmsg_sock() > > For TCP tx zero-copy, the kernel notifies the process of completions by > queuing completion notifications on the socket error queue. This patch > allows reading these notifications via recvmsg to support TCP tx > zero-copy. > > Ancillary data was originally disallowed due to privilege escalation > via io_uring's offloading of sendmsg() onto a kernel thread with kernel > credentials (https://crbug.com/project-zero/1975). So, we must ensure > that the socket type is one where the ancillary data types that are > delivered on recvmsg are plain data (no file descriptors or values that > are translated based on the identity of the calling process). Thank you for CCing us. The reason for PROTO_CMSG_DATA_ONLY is explained in the paragraph above in the commit message. PROTO_CMSG_DATA_ONLY is basically to allow-list a protocol that is guaranteed not to have the privilege escalation in https://crbug.com/project-zero/1975. TCP doesn't have that issue, and I believe UDP doesn't have that issue either (but please audit and confirm that with +Jann Horn). If you couldn't find any non-data CMSGs for UDP, you should just add PROTO_CMSG_DATA_ONLY to inet dgram sockets instead of introducing __sys_whitelisted_cmsghdrs as Stefan mentioned. Thanks, Soheil > This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE > with tx zero-copy enabled. Before this patch, we received -EINVALID from > this specific code path. After this patch, we could read tcp tx > zero-copy completion notifications from the MSG_ERRQUEUE. > > Signed-off-by: Soheil Hassas Yeganeh <soheil@xxxxxxxxxx> > Signed-off-by: Arjun Roy <arjunroy@xxxxxxxxxx> > Acked-by: Eric Dumazet <edumazet@xxxxxxxxxx> > Reviewed-by: Jann Horn <jannh@xxxxxxxxxx> > Reviewed-by: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: Luke Hsiao <lukehsiao@xxxxxxxxxx> > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> > > > but yes if there's nothing standing in the way of adding it for > > dgrams, and it covers UDP_SEGMENT and UDP_GRO then that's of course > > the least friction solution here. > > Yes, it would avoid whitelisting new specific usecases. > > metze > >