On Mon, Nov 9, 2009 at 4:09 AM, Gerrit Renker <gerrit@xxxxxxxxxxxxxx> wrote: > | > To sum up, here is whay I think is minimally required to satisfy the union > | > of RFC 4340, 4342, 4828, 5348, and 5622: > | > > | > struct tfrc_tx_packet_info { > | > u64 seqno:48, > | > is_ect0:1, > | > is_data_packet:1, > | > is_in_loss_interval:1; > | > u32 send_time; > | > u32 rtt_estimate; > | > struct tfrc_tx_packet_info *next; /* FIFO */ > | > }; > | > > | > That would be a per-packet storage cost of about 16 bytes, plus the pointer > | > (8 bytes on 64-bit architectures). One could avoid the pointer by defining a > | > u64 base_seqno; > | > and then > | > struct tfrc_tx_packet_info[some constant here]; > | > and then index the array relative to the base_seqno. > | > > | > | Yes, I believe that struct is enough too. But how long would be necessary > | the struct array to be? > | > The problem is the same as with Ack Vectors - the array (or list) can grow > arbitrarily large. You made a good reply, since all the questions are > inter-related. The first two I see here are > > 1) the choice of data structure (array or list) > 2) the design of a garbage-collector > > This includes your point from above, about the maximum size. To draw the > analogy to Ack Vectors, at the moment they use a fixed size. On certain > mediums (WiFi) there exist situations where even that fixed limit is > reached, causing an overflow with Ack Vectors that have reached a size > of 2 * 253 = 506 bytes. > > Looking after old data of sent packets is similar, the main difference I > see that at some stage "unused" old entries need to be collected, to avoid > the overflow problem which occurs when using a fixed-size structure. > > I find that 'Acknowledgments of Acknowledgments' is a bad idea, since it > means implementing reliable delivery over unreliable transport; on the > one hand DCCP is designed to be unreliable, but here suddenly is a break > with that design decision. > > So point (2) will probably mean coming up with some guessed heuristics > that will work for most scenarios, but may fail in others. > > > This is why I am not a big fan of the sender-based solution: to solve (2) and > your question from above requires a lot of testing of the garbage-collection > and book-keeping algorithms, rather than on the actual congestion control. > > One can spend a lot of time going over these issues, but of what use is the > most ingenious data structure if the overall protocol behavior does not > deliver a useful performance to users of the protocol? Yes, a sender-based implementation seems really complicated, mainly in principle. > > | > IIb) Further remarks > | > -------------------- > | > At first sight it would seem that storing the RTT also solves the problem > | > of inaccurate RTTs used at the receiver. Unfortunately, this is not the > | > case. X_recv is sampled over intervals of varying length which may or may > | > not equal the RTT. To factor out the effect of window counters, the sender > | > would need to store the packet size as well and would need to use rather > | > complicated computations - an ugly workaround. > | > | I didn't understand how the packet size would help and what > | computations are needed. > | > The above still refers to the earlier posting about letting the sender > supply the RTT estimate R_i in packet `i' as defined in RFC 5348, 3.2.1. > > Though the same section later suggests that a coarse-grained timestamp > is sufficient, in practice the inaccuracy of the RTT means inaccurate > X_recv, and as a consequence sub-optimal protocol performance. > > The problem is that the algorithm from RFC 4342, 8.1 assumes that the > rate of change of the window counter also relates to the change of > packet spacing (the difference between the T_i packe arrivel times). > > Howver, especially when using high-speed (1/10 Gbit) networking, this > assumption often does not hold in practice. Packets are sent by the network > card in bunches, or intervening switches/routers cause a compression of > packet inter-arrival times. Hence it is perfectly possible that a bundle of > packets with different Window Counter CCVal values arrive at virtually > the same time. For instance, on a 100Mbs ethernet I have seen spikes of > X_recv of up to 2-3 Gbits/sec. Several orders of magnitude from the > real packet speed (not to mention the unrealistic value). > > So the question above was asking whether there is a way for the sender > to "compute away" the inaccuracies reported by the receiver. Your reply > confirms my doubts that doing this is probably not possible. > > To clarify, I was asking whether it would be possible for the sender to > perform step (2) of RFC 5348, 6.2; to compensate for the fact that the > receiver does not have a reliable RTT estimate. I understand now the issue, thanks. Isn't better to just send the RTT estimate to the sender, as the RFC says? > > For example, when receiving feedback for packet `i', it would iterate > through the list/array, going back over as many packets as are covered > by the send time T_i of packet `i' minus the RTT estimate R_i at that > time, sum their packet sizes, and from that value recompute X_recv. > > This is a bit complicated if the garbage-collector has already purged > older entries, so part of the garbage collector would probably have > to watch over acknowledged packets. I add this as item > 3) validate X_recv > to the above running list of book-keeping items done at the sender. > > > | > One thing I stumbled across while reading your code was the fact that RFC 4342 > | > leaves it open as to how many Loss Intervals to send: on the one hand it follows > | > the suggestion of RFC 5348 to use 1+NINTERVAL=9, but on the other hand it does > | > not restrict the number of loss intervals. Also RFC 5622 does not limit the > | > number of Loss Intervals / Data Dropped options. > | > > | > If receiving n > 9 Loss Intervals, what does the sender do with the n-9 older > | > intervals? There must be some mechanism to stop these options from growing > | > beyond bounds, so it needs to store also which loss intervals have been > | > acknowledged, introducing the "Acknowledgment of Acknowledgments" > | > problem. > | > | In RFC 4342 section 8.6 it says that the limit of loss interval data > | to send is 28, and RFC 5622 8.7 says 84 for dropped packets option. > | But I don't see why to send so many data in these options. > | Yes, the most recent 9 loss intervals are required to be reported, > | except if the sender acknowledged previous sent loss intervals, so in > | that case only one is required, the open interval. > | And we can avoid the "Acknowledgment of Acknowledgments" if we always send > | the required 9 loss intervals, I think. > | > | > A second point is how to compute the loss event rate when n > 9. It seems > | > that this would mean grinding through all loss intervals using a window > | > of 9. If that is the case, the per-packet-computation costs become very > | > expensive. > | > | RFC 4342 section 8.6 suggests that only 9 loss intervals are required > | anyway. And I believe that's enough for the computation of current > | mean loss interval. What do you think? > | > Yes, absolutely, I am completely in favour of this very sensible suggestion. > > If people really must experiment with such outdated data, that could be > done in truly experimental patches. Especially since RFC 5348 normatively > recommends a value of n = 8 in section 5.4. And we are saving further > headaches about the book-keeping/garbage collection of old data. > > | > II) Computational part of the implementation > | > -------------------------------------------- > | > If only Loss Intervals alone are used, only these need to be verified > | > before being used to alter the sender behaviour. > | > > | > But when one or more other DCCP options also appear, the verification is > | > * intra: make sure each received option is in itself consistent, > | > * inter: make sure options are mutually consistent. > | > > | > The second has a combinatorial effect, i.e. n! verifications for n options. > | > > <snip> > | > | Yes, there's a combinatorial problem in checking the options for consistence. > | But, what if we find out that some option doesn't match against others? > | What action would be taken? > I add this as > 4) define policy for dealing with options that are not mutually consistent > > | First, what can cause the receiver to send inconsistent options? > | A bad implementation only? > Yes I think that a bad implementation (whether on purpose or not) would be > the main cause, since header options are protected even if partial > checksums are used (RFC 4340, 9.2). > > But there is also the benign case mentioned at the end of RFC 4342, 9.2, > where a receiver collapses multiple losses into a single loss event, i.e. > 5) validate received Loss Intervals and regroup the receiver-based > information if necessary, without interpreting this as attempted > receiver misbehaviour. > > | Accordingly to ecn nonce echo sum algorithm, if a receiver is found to be > | lying about loss or to be bad implemented, the sender adjusts the send rate > | as if loss were perceived. > | Can we do the same in this situation? If so, can we skip checking options > | between them and only check ecn nonce sum? > This is difficult since Ack Vectors and Loss Intervals use different > definitions of ECN Nonce sum (last paragraph in RFC 4342, 9.1), i.e. we have > 6) separate algorithms to compute Ack Vector/Loss Intervals ECN Nonce sum. > > With regard to (5) above, your suggestion gives > 7) validate options, on mismatch other than (5) only validate ECN nonce. > > | If some option is wrong it show more loss (or any worse situation for the > | receiver) or conceals loss. In the first case, I don't believe we need to care, > | and in the second, the ecn nonce sum can reveal the bad acting of the receiver. > Yes you are right, we need not worry if a receiver reports a higher loss rate > than the verification done by the sender (which recomputes the data that the > receiver already has computed) calculates. > > But for the second case, there is no guarantee to catch a misbehaving > receiver, only a 50% chance at the end of many computations. Isn't it 50% chance at each ecn verified? So, at the end we'll end up with 100%? > > RFC 4342, 9 suggests one way of verifying Loss Intervals / Ack Vectors: > 5) occasionally do not send a packet, or send packet out of order. > > This increases complexity of book-keeping, the sender needs to keep track > which of the sent packets was a fake send/drop. It also requires an algorithm > to iterate over the sender data structures in order to find out whether the > reasoning of the receiver is sane. I have doubts whether this can be done > without sacrificing the performance of the in-kernel sender side. > I have doubts either. This seems to be too complicated and not much useful. > > | > III) Closing remarks in favour of receiver-based implementation > | > --------------------------------------------------------------- > | > Finally, both RFC 4342 and RFC 5622 do not explicitly discard the > | > possibility of using a receiver-based implementation. Quoting > | > RFC 4342, 3.2: "If it prefers, the sender can also use a loss event > | > rate calculated and reported by the receiver." > | > Furthermore, the revised TFRC specification points out in section 7 > | > the advantages that a receiver-based implementation has: > | > * it does not mandate reliable delivery of packet loss data; > | > * it is robust against the loss of feedback packets; > | > * better suited for scalable server design. > | > > | > Quite likely, if the server does not have to store and validate a mass > | > of data, it is also less prone to be toppled by DoS attacks. > | > | You're right. But what the RFC's says about it is almost exactly the > | opposite, isn't? What can we do about it? I like the receiver-based design, > | but I believe that loss intervals are interesting, mostly because of > | receiver behavior verification. > | > While writing the above reply, I was amazed to see how much of the computation > that has already been done at the receiver needs to be done again at the sender, > ust in order to be able to verify the data. > > To me this seems very inefficient. > > Moreover, the biggest danger I see here is spending a lot of time with the > details of sender book-keeping and verification, just to then see that the > performance of CCID-3/4 in practice turns out to be below the standards > acceptable to even modest users. > > I think it is clearly better to prefer the simplest possible implementation > in such cases, to better debug live protocol performance. > > In particular, since CCID-4 is designed to be an experimental protocol > (i.e. if it works, RFC 5622 may mature into a Proposed Standard, if not, > it might be superseded by a different specification). > > And I think that testing the actual user performance has the highest priority. Yes, we can work more with a simpler implementation at the receiver side and focus on performance and test, and features too. After, we have a stable version and good enough in performance terms, we can continue improving the sender side. > > The literature on the subject is almost exclusively done on experiences in > ns-2 userland. Almost no Internet experiments at all have been done with DCCP. > > This is because the IPPROTO = 33 identifier needs to be entered > especially into a firewall, which opens holes that firewall > administrators don't like to open (unless the firewall is based on > a recent Linux kernel, opening all ports for IP protocol identifier > 33 is the only way of allowing DCCP traffic in/out of a firewall). > > In addition, most people use NAT at home, putting another obstacle > on experiments. The result is then that tests are done in a lab testbed > or in virtualisation - emulated networks. > > To conclude, I still think that the simpler, receiver-based implementation > gives a better start. A 'perfect' receiver implementation is also a good > reference point to start protocol evaluation: if the performance is bad > despite getting things right at the receiver, then other parts of the > protocol need investigation/improvement. I agree. It's a risk to work on the sender at the moment, implementing these features, algorithms and ending with a CCID that doesn't match the expected performance. Can you list the pending tasks in both code and tests to be done? Cheers, Ivo -- Ivo Augusto Andrade Rocha Calado MSc. Candidate Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br Systems and Computing Department - http://www.dsc.ufcg.edu.br Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br Federal University of Campina Grande - http://www.ufcg.edu.br PGP: 0x03422935 Putt's Law: Technology is dominated by two types of people: Those who understand what they do not manage. Those who manage what they do not understand. -- To unsubscribe from this list: send the line "unsubscribe dccp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html