Em Tue, Sep 25, 2007 at 03:30:48PM +0100, Gerrit Renker escreveu: > [DCCP]: Handle timestamps on Request/Response exchange separately > > In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response > exchange. This patch addresses the following situation: > * timestamps are recorded on the listening socket; I noticed this, I think it is unaceptable to do that, therefore the best thing is to combine the two patches, so as not to introduce a problem that is fixed in the following patch. Look below for some other considerations. > * Responses are sent from dccp_request_sockets; > * suppose two connections reach the listening socket with very small time in between: > * the first timestamp value gets overwritten by the second connection request. > > It may seem unlikely that this bug will occur, since the Response is sent out immediately, but to me > this does not seem right. I am further not sure whether the socket locks provide sufficient protection > against overwriting timestamp values on the listening socket. It is just wrong, I cannot see anything that would prevent this window from being hit. > This patch therefore > * makes memory use for timestamps dynamic (to use less when no timestamps are present); > * separates `handshake timestamps' from `established timestamps'; I didn't understood this one. Care to further explain? Anyway, I think that adding yet another allocation in a connection lifetime is not good. One of the most pressing things for me after merging all the patches that are pending (THANK YOU! :-) ) is to work on DCCP memory consuption and accounting, the code has to be made more robust :-\ I think that we should just add dreq_{time,echo} to dccp_request_sock and keep dccp_sock as is. Now it is: [acme@mica net-2.6.24]$ pahole -C dccp_request_sock net/dccp/minisocks.o struct dccp_request_sock { struct inet_request_sock dreq_inet_rsk; /* 0 56 */ __u64 dreq_iss; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __u64 dreq_isr; /* 64 8 */ __be32 dreq_service; /* 72 4 */ /* size: 76, cachelines: 2 */ /* last cacheline: 12 bytes */ }; I suggest it to become: [acme@mica net-2.6.24]$ pahole -C dccp_request_sock net/dccp/minisocks.o struct dccp_request_sock { struct inet_request_sock dreq_inet_rsk; /* 0 56 */ __u64 dreq_iss; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ __u64 dreq_isr; /* 64 8 */ __be32 dreq_service; /* 72 4 */ __u32 dreq_tstamp_echo; /* 76 4 */ ktime_t dreq_tstamp_time; /* 80 8 */ /* size: 88, cachelines: 2 */ /* last cacheline: 24 bytes */ }; Humm, these minisocks are getting fat... another thing for my TODO list, request_sock::ts_recent seems to be used only by the TCP machinery, ripe for the picking.... Anyway, I'll move along the patch queue looking for more easily cherrypickable patches. > * de-allocates in request socket destructor if previous de-allocation has failed. > > Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with > '!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3). Well, not really, a timestamp echo on a request packet would make no sense :-) But yeah, the code right now its wrong as it doesn't puts timestamp echo options in data packets, and that is allowed. > Lastly, with sampling RTTs in mind, the earliest-unacknowledged timestamp is always kept on the > socket (mimicking RFC 1323). This is not fully worked out, to do a RFC1323-style algorithm requires > more work, and possibly some changes; but in can in principle benefit from the provided code. > > Signed-off-by: Gerrit Renker <gerrit@xxxxxxxxxxxxxx> > --- > include/linux/dccp.h | 25 +++++++++++++++++++++---- > net/dccp/dccp.h | 2 +- > net/dccp/ipv4.c | 4 ++++ > net/dccp/ipv6.c | 4 ++++ > net/dccp/minisocks.c | 4 ++++ > net/dccp/options.c | 41 +++++++++++++++++++++++++++-------------- > net/dccp/proto.c | 5 +++++ > 7 files changed, 66 insertions(+), 19 deletions(-) > > --- a/include/linux/dccp.h > +++ b/include/linux/dccp.h > @@ -407,11 +407,30 @@ struct dccp_opt_pend { > > extern void dccp_minisock_init(struct dccp_minisock *dmsk); > > +/** > + * struct dccp_ts_echo - Record incoming timestamp to echo it later > + * @ts_time: arrival time of timestamp > + * @ts_echo: the timestamp value recorded at @ts_time > + */ > +struct dccp_ts_echo { > + ktime_t ts_time; > + __u32 ts_echo; > +}; > + > +/** > + * struct dccp_request_sock - represent DCCP-specific connection request > + * @dreq_inet_rsk: structure inherited from > + * @dreq_iss: initial sequence number sent on the Response (RFC 4340, 7.1) > + * @dreq_isr: initial sequence number received on the Request > + * @dreq_service: service code present on the Request (there is just one) > + * @dreq_tstamp: most recent timestamp received during connection setup > + */ > struct dccp_request_sock { > struct inet_request_sock dreq_inet_rsk; > __u64 dreq_iss; > __u64 dreq_isr; > __be32 dreq_service; > + struct dccp_ts_echo *dreq_tstamp; > }; > > static inline struct dccp_request_sock *dccp_rsk(const struct request_sock *req) > @@ -477,8 +496,7 @@ struct dccp_ackvec; > * @dccps_gar - greatest valid ack number received on a non-Sync; initialized to %dccps_iss > * @dccps_service - first (passive sock) or unique (active sock) service code > * @dccps_service_list - second .. last service code on passive socket > - * @dccps_timestamp_time - time of latest TIMESTAMP option > - * @dccps_timestamp_echo - latest timestamp received on a TIMESTAMP option > + * @dccps_tstamp - most recently received timestamp to echo (RFC 4340, 13.1) > * @dccps_l_ack_ratio - feature-local Ack Ratio > * @dccps_r_ack_ratio - feature-remote Ack Ratio > * @dccps_pcslen - sender partial checksum coverage (via sockopt) > @@ -514,8 +532,7 @@ struct dccp_sock { > __u64 dccps_gar; > __be32 dccps_service; > struct dccp_service_list *dccps_service_list; > - ktime_t dccps_timestamp_time; > - __u32 dccps_timestamp_echo; > + struct dccp_ts_echo *dccps_tstamp; > __u16 dccps_l_ack_ratio; > __u16 dccps_r_ack_ratio; > __u16 dccps_pcslen; > --- a/net/dccp/options.c > +++ b/net/dccp/options.c > @@ -56,6 +56,7 @@ int dccp_parse_options(struct sock *sk, > const unsigned char *opt_end = (unsigned char *)dh + > (dh->dccph_doff * 4); > struct dccp_options_received *opt_recv = &dp->dccps_options_received; > + struct dccp_ts_echo **tse; > unsigned char opt, len; > unsigned char *value; > u32 elapsed_time, opt_val; > @@ -159,12 +160,25 @@ int dccp_parse_options(struct sock *sk, > if (len != 4) > goto out_invalid_option; > > + tse = dreq? &dreq->dreq_tstamp : dp->dccps_tstamp; huh? both dreq_tstamp and dccps_tstamp are pointers to dccp_ts_echo, you are mixing pointer types here by using '&' on one and not on the other :) > + /* > + * Keep the earliest received timestamp on the socket, > + * until echoing to the peer frees it. This policy is > + * useful for doing RTT measurements (eg. RFC 1323, 3.4) > + */ > + if (*tse != NULL) > + break; > + > + *tse = kmalloc(sizeof(struct dccp_ts_echo), GFP_ATOMIC); > + if (*tse == NULL) { > + DCCP_CRIT("can not store received timestamp"); we avoid this situation by having the dccp_ts_echo fields in the minisock and keeping dccp_sock as is now. > + break; > + } > + > + (*tse)->ts_time = ktime_get_real(); > opt_val = get_unaligned((u32 *)value); > opt_recv->dccpor_timestamp = ntohl(opt_val); > - > - /* FIXME: if dreq != NULL, don't store this on listening socket */ > - dp->dccps_timestamp_echo = opt_recv->dccpor_timestamp; > - dp->dccps_timestamp_time = ktime_get_real(); > + (*tse)->ts_echo = opt_recv->dccpor_timestamp; > > dccp_pr_debug("%s rx opt: TIMESTAMP=%u, ackno=%llu\n", > dccp_role(sk), opt_recv->dccpor_timestamp, > @@ -384,15 +398,14 @@ int dccp_insert_option_timestamp(struct > > EXPORT_SYMBOL_GPL(dccp_insert_option_timestamp); > > -static int dccp_insert_option_timestamp_echo(struct sock *sk, > +static int dccp_insert_option_timestamp_echo(struct dccp_ts_echo **tse, > struct sk_buff *skb) > { > - struct dccp_sock *dp = dccp_sk(sk); > __be32 tstamp_echo; > int len, elapsed_time_len; > unsigned char *to; > const suseconds_t delta = ktime_us_delta(ktime_get_real(), > - dp->dccps_timestamp_time); > + (*tse)->ts_time); > u32 elapsed_time = delta / 10; > elapsed_time_len = dccp_elapsed_time_len(elapsed_time); > len = 6 + elapsed_time_len; > @@ -406,7 +419,7 @@ static int dccp_insert_option_timestamp_ > *to++ = DCCPO_TIMESTAMP_ECHO; > *to++ = len; > > - tstamp_echo = htonl(dp->dccps_timestamp_echo); > + tstamp_echo = htonl((*tse)->ts_echo); > memcpy(to, &tstamp_echo, 4); > to += 4; > > @@ -418,8 +431,8 @@ static int dccp_insert_option_timestamp_ > memcpy(to, &var32, 4); > } > > - dp->dccps_timestamp_echo = 0; > - dp->dccps_timestamp_time = ktime_set(0, 0); > + kfree(*tse); > + *tse = NULL; > return 0; > } > > @@ -528,10 +541,6 @@ int dccp_insert_options(struct sock *sk, > dccp_ackvec_pending(dp->dccps_hc_rx_ackvec) && > dccp_insert_option_ackvec(sk, skb)) > return -1; > - > - if (dp->dccps_timestamp_echo != 0 && > - dccp_insert_option_timestamp_echo(sk, skb)) > - return -1; > } > > if (dp->dccps_hc_rx_insert_options) { > @@ -555,6 +564,10 @@ int dccp_insert_options(struct sock *sk, > dccp_insert_option_timestamp(sk, skb)) > return -1; > > + if (dp->dccps_tstamp != NULL && > + dccp_insert_option_timestamp_echo(&dp->dccps_tstamp, skb)) > + return -1; > + > /* XXX: insert other options when appropriate */ > > if (DCCP_SKB_CB(skb)->dccpd_opt_len != 0) { > --- a/net/dccp/dccp.h > +++ b/net/dccp/dccp.h > @@ -413,7 +413,7 @@ static inline void dccp_update_gss(struc > static inline int dccp_ack_pending(const struct sock *sk) > { > const struct dccp_sock *dp = dccp_sk(sk); > - return dp->dccps_timestamp_echo != 0 || > + return dp->dccps_tstamp != NULL || > #ifdef CONFIG_IP_DCCP_ACKVEC > (dccp_msk(sk)->dccpms_send_ack_vector && > dccp_ackvec_pending(dp->dccps_hc_rx_ackvec)) || > --- a/net/dccp/minisocks.c > +++ b/net/dccp/minisocks.c > @@ -120,6 +120,7 @@ struct sock *dccp_create_openreq_child(s > newdp->dccps_role = DCCP_ROLE_SERVER; > newdp->dccps_hc_rx_ackvec = NULL; > newdp->dccps_service_list = NULL; > + newdp->dccps_tstamp = NULL; > newdp->dccps_service = dreq->dreq_service; > newicsk->icsk_rto = DCCP_TIMEOUT_INIT; > > @@ -303,9 +304,12 @@ EXPORT_SYMBOL_GPL(dccp_reqsk_send_ack); > > void dccp_reqsk_init(struct request_sock *req, struct sk_buff *skb) > { > + struct dccp_request_sock *dreq = dccp_rsk(req); > + > inet_rsk(req)->rmt_port = dccp_hdr(skb)->dccph_sport; > inet_rsk(req)->acked = 0; > req->rcv_wnd = sysctl_dccp_feat_sequence_window; > + dreq->dreq_tstamp = NULL; > } > > EXPORT_SYMBOL_GPL(dccp_reqsk_init); > --- a/net/dccp/proto.c > +++ b/net/dccp/proto.c > @@ -254,6 +254,11 @@ int dccp_destroy_sock(struct sock *sk) > kfree(dp->dccps_service_list); > dp->dccps_service_list = NULL; > > + if (dp->dccps_tstamp != NULL) { > + kfree(dp->dccps_tstamp); > + dp->dccps_tstamp = NULL; > + } > + > if (dmsk->dccpms_send_ack_vector) { > dccp_ackvec_free(dp->dccps_hc_rx_ackvec); > dp->dccps_hc_rx_ackvec = NULL; > --- a/net/dccp/ipv4.c > +++ b/net/dccp/ipv4.c > @@ -551,6 +551,10 @@ out: > > static void dccp_v4_reqsk_destructor(struct request_sock *req) > { > + struct dccp_request_sock *dreq = dccp_rsk(req); > + > + if (dreq->dreq_tstamp != NULL) > + kfree(dreq->dreq_tstamp); > kfree(inet_rsk(req)->opt); > } > > --- a/net/dccp/ipv6.c > +++ b/net/dccp/ipv6.c > @@ -295,6 +295,10 @@ done: > > static void dccp_v6_reqsk_destructor(struct request_sock *req) > { > + struct dccp_request_sock *dreq = dccp_rsk(req); > + > + if (dreq->dreq_tstamp != NULL) > + kfree(dreq->dreq_tstamp); > if (inet6_rsk(req)->pktopts != NULL) > kfree_skb(inet6_rsk(req)->pktopts); > } > - > To unsubscribe from this list: send the line "unsubscribe dccp" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe dccp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html