On Sun, Feb 18, 2018 at 01:44:42PM +0000, Richard Haines wrote: > On Fri, 2018-02-16 at 23:28 -0500, Neil Horman wrote: > > On Fri, Feb 16, 2018 at 07:51:02PM -0200, Marcelo Ricardo Leitner > > wrote: > > > On Fri, Feb 16, 2018 at 03:14:35PM -0500, Neil Horman wrote: > > > > On Fri, Feb 16, 2018 at 10:56:07AM -0200, Marcelo Ricardo Leitner > > > > wrote: > > > > > On Thu, Feb 15, 2018 at 09:15:40AM -0500, Neil Horman wrote: > > > > > > On Tue, Feb 13, 2018 at 08:54:44PM +0000, Richard Haines > > > > > > wrote: > > > > > > > Add ip option support to allow LSM security modules to > > > > > > > utilise CIPSO/IPv4 > > > > > > > and CALIPSO/IPv6 services. > > > > > > > > > > > > > > Signed-off-by: Richard Haines <richard_c_haines@btinternet. > > > > > > > com> > > > > > > > --- > > > > > > > include/net/sctp/sctp.h | 4 +++- > > > > > > > include/net/sctp/structs.h | 2 ++ > > > > > > > net/sctp/chunk.c | 12 +++++++----- > > > > > > > net/sctp/ipv6.c | 42 > > > > > > > +++++++++++++++++++++++++++++++++++------- > > > > > > > net/sctp/output.c | 5 ++++- > > > > > > > net/sctp/protocol.c | 36 > > > > > > > ++++++++++++++++++++++++++++++++++++ > > > > > > > net/sctp/socket.c | 14 ++++++++++---- > > > > > > > 7 files changed, 97 insertions(+), 18 deletions(-) > > > > > > > > > > > > > > diff --git a/include/net/sctp/sctp.h > > > > > > > b/include/net/sctp/sctp.h > > > > > > > index f7ae6b0..25c5c87 100644 > > > > > > > --- a/include/net/sctp/sctp.h > > > > > > > +++ b/include/net/sctp/sctp.h > > > > > > > @@ -441,9 +441,11 @@ static inline int > > > > > > > sctp_list_single_entry(struct list_head *head) > > > > > > > static inline int sctp_frag_point(const struct > > > > > > > sctp_association *asoc, int pmtu) > > > > > > > { > > > > > > > struct sctp_sock *sp = sctp_sk(asoc->base.sk); > > > > > > > + struct sctp_af *af = sp->pf->af; > > > > > > > int frag = pmtu; > > > > > > > > > > > > > > - frag -= sp->pf->af->net_header_len; > > > > > > > + frag -= af->ip_options_len(asoc->base.sk); > > > > > > > + frag -= af->net_header_len; > > > > > > > frag -= sizeof(struct sctphdr) + > > > > > > > sctp_datachk_len(&asoc->stream); > > > > > > > > > > > > > > if (asoc->user_frag) > > > > > > > diff --git a/include/net/sctp/structs.h > > > > > > > b/include/net/sctp/structs.h > > > > > > > index 03e92dd..ead5fce 100644 > > > > > > > --- a/include/net/sctp/structs.h > > > > > > > +++ b/include/net/sctp/structs.h > > > > > > > @@ -491,6 +491,7 @@ struct sctp_af { > > > > > > > void (*ecn_capable)(struct sock > > > > > > > *sk); > > > > > > > __u16 net_header_len; > > > > > > > int sockaddr_len; > > > > > > > + int (*ip_options_len)(struct sock > > > > > > > *sk); > > > > > > > sa_family_t sa_family; > > > > > > > struct list_head list; > > > > > > > }; > > > > > > > @@ -515,6 +516,7 @@ struct sctp_pf { > > > > > > > int (*addr_to_user)(struct sctp_sock *sk, union > > > > > > > sctp_addr *addr); > > > > > > > void (*to_sk_saddr)(union sctp_addr *, struct sock > > > > > > > *sk); > > > > > > > void (*to_sk_daddr)(union sctp_addr *, struct sock > > > > > > > *sk); > > > > > > > + void (*copy_ip_options)(struct sock *sk, struct > > > > > > > sock *newsk); > > > > > > > struct sctp_af *af; > > > > > > > }; > > > > > > > > > > > > > > diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c > > > > > > > index 991a530..d5c0ef7 100644 > > > > > > > --- a/net/sctp/chunk.c > > > > > > > +++ b/net/sctp/chunk.c > > > > > > > @@ -154,7 +154,6 @@ static void sctp_datamsg_assign(struct > > > > > > > sctp_datamsg *msg, struct sctp_chunk *chu > > > > > > > chunk->msg = msg; > > > > > > > } > > > > > > > > > > > > > > - > > > > > > > /* A data chunk can have a maximum payload of (2^16 - > > > > > > > 20). Break > > > > > > > * down any such message into smaller > > > > > > > chunks. Opportunistically, fragment > > > > > > > * the chunks down to the current MTU constraints. We may > > > > > > > get refragmented > > > > > > > @@ -171,6 +170,8 @@ struct sctp_datamsg > > > > > > > *sctp_datamsg_from_user(struct sctp_association *asoc, > > > > > > > struct list_head *pos, *temp; > > > > > > > struct sctp_chunk *chunk; > > > > > > > struct sctp_datamsg *msg; > > > > > > > + struct sctp_sock *sp; > > > > > > > + struct sctp_af *af; > > > > > > > int err; > > > > > > > > > > > > > > msg = sctp_datamsg_new(GFP_KERNEL); > > > > > > > @@ -189,9 +190,11 @@ struct sctp_datamsg > > > > > > > *sctp_datamsg_from_user(struct sctp_association *asoc, > > > > > > > /* This is the biggest possible DATA chunk that > > > > > > > can fit into > > > > > > > * the packet > > > > > > > */ > > > > > > > - max_data = asoc->pathmtu - > > > > > > > - sctp_sk(asoc->base.sk)->pf->af- > > > > > > > >net_header_len - > > > > > > > - sizeof(struct sctphdr) - > > > > > > > sctp_datachk_len(&asoc->stream); > > > > > > > + sp = sctp_sk(asoc->base.sk); > > > > > > > + af = sp->pf->af; > > > > > > > + max_data = asoc->pathmtu - af->net_header_len - > > > > > > > + sizeof(struct sctphdr) - > > > > > > > sctp_datachk_len(&asoc->stream) - > > > > > > > + af->ip_options_len(asoc->base.sk); > > > > > > > max_data = SCTP_TRUNC4(max_data); > > > > > > > > > > > > > > /* If the the peer requested that we authenticate > > > > > > > DATA chunks > > > > > > > @@ -211,7 +214,6 @@ struct sctp_datamsg > > > > > > > *sctp_datamsg_from_user(struct sctp_association *asoc, > > > > > > > > > > > > > > /* Set first_len and then account for possible > > > > > > > bundles on first frag */ > > > > > > > first_len = max_data; > > > > > > > - > > > > > > > /* Check to see if we have a pending SACK and try > > > > > > > to let it be bundled > > > > > > > * with this message. Do this if we don't have > > > > > > > any data queued already. > > > > > > > * To check that, look at out_qlen and retransmit > > > > > > > list. > > > > > > > diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c > > > > > > > index e35d4f7..0b0f895 100644 > > > > > > > --- a/net/sctp/ipv6.c > > > > > > > +++ b/net/sctp/ipv6.c > > > > > > > @@ -427,6 +427,38 @@ static void > > > > > > > sctp_v6_copy_addrlist(struct list_head *addrlist, > > > > > > > rcu_read_unlock(); > > > > > > > } > > > > > > > > > > > > > > +/* Copy over any ip options */ > > > > > > > +static void sctp_v6_copy_ip_options(struct sock *sk, > > > > > > > struct sock *newsk) > > > > > > > +{ > > > > > > > + struct ipv6_pinfo *newnp, *np = inet6_sk(sk); > > > > > > > + struct ipv6_txoptions *opt; > > > > > > > + > > > > > > > + newnp = inet6_sk(newsk); > > > > > > > + > > > > > > > + rcu_read_lock(); > > > > > > > + opt = rcu_dereference(np->opt); > > > > > > > + if (opt) > > > > > > > + opt = ipv6_dup_options(newsk, opt); > > > > > > > > > > > > do you want to print a warning here in the > > > > > > event the allocation > > > > > > for the dup operation fails? > > > > > > > > > > > > > + RCU_INIT_POINTER(newnp->opt, opt); > > > > > > > + rcu_read_unlock(); > > > > > > > +} > > > > > > > + > > > > > > > +/* Account for the IP options */ > > > > > > > +static int sctp_v6_ip_options_len(struct sock *sk) > > > > > > > +{ > > > > > > > + struct ipv6_pinfo *np = inet6_sk(sk); > > > > > > > + struct ipv6_txoptions *opt; > > > > > > > + int len = 0; > > > > > > > + > > > > > > > + rcu_read_lock(); > > > > > > > + opt = rcu_dereference(np->opt); > > > > > > > + if (opt) > > > > > > > + len = opt->opt_flen + opt->opt_nflen; > > > > > > > + > > > > > > > + rcu_read_unlock(); > > > > > > > + return len; > > > > > > > +} > > > > > > > + > > > > > > > /* Initialize a sockaddr_storage from in incoming skb. */ > > > > > > > static void sctp_v6_from_skb(union sctp_addr *addr, struct > > > > > > > sk_buff *skb, > > > > > > > int is_saddr) > > > > > > > @@ -666,7 +698,6 @@ static struct sock > > > > > > > *sctp_v6_create_accept_sk(struct sock *sk, > > > > > > > struct sock *newsk; > > > > > > > struct ipv6_pinfo *newnp, *np = inet6_sk(sk); > > > > > > > struct sctp6_sock *newsctp6sk; > > > > > > > - struct ipv6_txoptions *opt; > > > > > > > > > > > > > > newsk = sk_alloc(sock_net(sk), PF_INET6, > > > > > > > GFP_KERNEL, sk->sk_prot, kern); > > > > > > > if (!newsk) > > > > > > > @@ -689,12 +720,7 @@ static struct sock > > > > > > > *sctp_v6_create_accept_sk(struct sock *sk, > > > > > > > newnp->ipv6_ac_list = NULL; > > > > > > > newnp->ipv6_fl_list = NULL; > > > > > > > > > > > > > > - rcu_read_lock(); > > > > > > > - opt = rcu_dereference(np->opt); > > > > > > > - if (opt) > > > > > > > - opt = ipv6_dup_options(newsk, opt); > > > > > > > - RCU_INIT_POINTER(newnp->opt, opt); > > > > > > > - rcu_read_unlock(); > > > > > > > + sctp_v6_copy_ip_options(sk, newsk); > > > > > > > > > > > > > > /* Initialize sk's sport, dport, rcv_saddr and > > > > > > > daddr for getsockname() > > > > > > > * and getpeername(). > > > > > > > @@ -1041,6 +1067,7 @@ static struct sctp_af sctp_af_inet6 = > > > > > > > { > > > > > > > .ecn_capable = sctp_v6_ecn_capable, > > > > > > > .net_header_len = sizeof(struct > > > > > > > ipv6hdr), > > > > > > > .sockaddr_len = sizeof(struct > > > > > > > sockaddr_in6), > > > > > > > + .ip_options_len = > > > > > > > sctp_v6_ip_options_len, > > > > > > > #ifdef CONFIG_COMPAT > > > > > > > .compat_setsockopt = compat_ipv6_setsockopt, > > > > > > > .compat_getsockopt = compat_ipv6_getsockopt, > > > > > > > @@ -1059,6 +1086,7 @@ static struct sctp_pf sctp_pf_inet6 = > > > > > > > { > > > > > > > .addr_to_user = sctp_v6_addr_to_user, > > > > > > > .to_sk_saddr = sctp_v6_to_sk_saddr, > > > > > > > .to_sk_daddr = sctp_v6_to_sk_daddr, > > > > > > > + .copy_ip_options = sctp_v6_copy_ip_options, > > > > > > > .af = &sctp_af_inet6, > > > > > > > }; > > > > > > > > > > > > > > diff --git a/net/sctp/output.c b/net/sctp/output.c > > > > > > > index 01a26ee..668e2fa 100644 > > > > > > > --- a/net/sctp/output.c > > > > > > > +++ b/net/sctp/output.c > > > > > > > @@ -151,7 +151,10 @@ void sctp_packet_init(struct > > > > > > > sctp_packet *packet, > > > > > > > INIT_LIST_HEAD(&packet->chunk_list); > > > > > > > if (asoc) { > > > > > > > struct sctp_sock *sp = sctp_sk(asoc- > > > > > > > >base.sk); > > > > > > > - overhead = sp->pf->af->net_header_len; > > > > > > > + struct sctp_af *af = sp->pf->af; > > > > > > > + > > > > > > > + overhead = af->net_header_len + > > > > > > > + af->ip_options_len(asoc- > > > > > > > >base.sk); > > > > > > > } else { > > > > > > > overhead = sizeof(struct ipv6hdr); > > > > > > > } > > > > > > > > > > > > I'm a bit worried about this mechanism. Unlike tcp or udp, > > > > > > where a packet is > > > > > > allocated its options field is pushed within the same call > > > > > > stack (or more > > > > > > notably, during a single cycle in which the sock lock is > > > > > > held), sctp allocates a > > > > > > packet here, and holds it for potentially multiple calls from > > > > > > userspace while > > > > > > chunks are collected and added to it. During those multiple > > > > > > calls the socket > > > > > > > > > > Not sure if you simplified it here but that's not exactly how > > > > > it > > > > > works. The packet is not built chunk by chunk per sendmsg() > > > > > call as > > > > > you described, but instead it will collect the chunks in a list > > > > > (outq) > > > > > up to the point that it notices that it's time to send the > > > > > packet. > > > > > Then, it will call sctp_outq_flush(), which will assemble the > > > > > packet > > > > > and send to IP layer as needed. Chunks that won't fit on the > > > > > packet > > > > > and that will only be sent later, they aren't added to any > > > > > packet but > > > > > remains on outq list. > > > > > > > > > > > > > Yes, I simplified it, and yes, given that I've maintained this > > > > code > > > > since 2012, I know how it works. > > > > > > That's really not how I meant it. I had to read the paragraph 3 > > > times > > > before seeing the simplification. But Richard is not that > > > acquainted > > > with the code and the simplification was to say the least risky for > > > his understanding and the implementation he is doing. > > > > > > > > > > > > The packet is never freed, it's embedded into the transport. > > > > > It's just > > > > > reconfigured. > > > > > > > > > > Nevertheless, I agree the issue is there. > > > > > > > > > > > > > Which is really the salient point. > > > > > > > > > > lock is released and reaquired, during which time the set of > > > > > > configured ip > > > > > > options might change. Then when the packet is passed to the > > > > > > ip layer and the > > > > > > options copied into the packet, we might have a different > > > > > > option length leading > > > > > > to an skb_over panic. > > > > > > > > > > > > Suggest that it might be better to buffer any changes in > > > > > > options and only have > > > > > > them take effect any time a new packet is allocated. > > > > > > > > > > s/allocated/configured/ :) > > > > > > > > Seriously? You clearly knew what I was saying. I understand I > > > > misued the term, > > > > > > yes, ^ I knew, but I doubt Richard > > > also knew > > > > > > Sorry if somehow I had this connotation. That wasn't the idea. > > > > > > > Its ok I'm sorry as well. You were responding to me, and I perfectly > > well know > > how this code works, So I assumed you were trying to explain it to > > me. > > > > > > but do you really want to harp on it? > > > > > > No. Just want Richard to understand what is meant here. > > > sctp_packet_init and sctp_packet_config are two distinct moments in > > > there. > > > > > > > > > > > > I think we can fix it by moving this code to > > > > > sctp_packet_config() > > > > > instead. On a quick check here, seems all packet->overhead > > > > > references > > > > > are after it gets called for a packet that is about to be > > > > > built. > > > > > > > > > > > > > Yeah, I agree we could move it there, though I think I would > > > > prefer to see it in > > > > sctp_packet_transmit. If we do it there, then we only have to > > > > compute the > > > > overhead size once before building the skb (that is to say, if we > > > > do it in > > > > packet_config, we potentially compute it multiple times if we > > > > change > > > > transports). In fact, if we do it in sct_packet_transmit, then we > > > > potentially > > > > can eliminate the overhead member from the sctp_packet struct > > > > entirely, as we > > > > can just store the computed overhead in a stack variable and use > > > > it in the > > > > skb_reserve call. > > > > > > in sctp_packet_transmit I think it would be too late because we > > > have > > > to know the entire overhead in upfront in order to know if the > > > chunks > > > that are getting enlisted on the packet actually fit in there. > > > > > > Marcelo > > > > Thats a fair point, but we have a backoff path (the err label) in > > sctp_packet_trasmit. If the ip options means a chunk doesn't fit, > > we follow > > the error path, reset the packet, and try again. Its slow, to be > > sure, but I > > wonder what the trade off is a net gain(i.e. is it better to hit the > > rare case where ip > > options change and cause a packet to get discarded and retransmitted, > > or better > > to somewhat more frequently recompute the overhead lengtha net gain) > > > > I suppose its a bit academic. We're talking about a few memory > > dereferences and > > an add or two. Lets just go with sctp_packet_config for the overhead > > computation location. > > > > Neil > > Guys, Thanks for all the comments, I did need the beginners guide. > Currently testing patches and will post a new V7 "Add ip option > support" patch early next week for comment. > Thank you! Neil