Also, as noted earlier in this discussion, RFC 7657 explicitly discourages use of multiple DSCPs in a single TCP connection. That needs to be reflected in the TCP encapsulation text in the trill-over-ip draft - in particular, the current text in Section 4.3 on mapping to DSCPs from TRILL priority and DEI does not appear to be consistent with RFC 7657 for TCP-based encapsulation. Thanks, --David > -----Original Message----- > From: Donald Eastlake [mailto:d3e3e3@xxxxxxxxx] > Sent: Friday, June 30, 2017 9:43 PM > To: Black, David <david.black@xxxxxxx> > Cc: Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx>; tsv- > art@xxxxxxxx; draft-ietf-trill-over-ip.all@xxxxxxxx; IETF Discussion > <ietf@xxxxxxxx>; trill@xxxxxxxx > Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 - ECN & > DSCP considerations > > Hi David, > > On Mon, Jun 26, 2017 at 3:04 PM, Black, David <David.Black@xxxxxxxx> > wrote: > > Adding some comments on ECN and DSCP ... > > > >> > Section 4.3: > >> > > >> > TRILL over IP implementations MUST support setting the DSCP value in > >> > the outer IP Header of TRILL packets they send by mapping the TRILL > >> > priority and DEI to the DSCP. They MAY support, for a TRILL Data > >> > packet where the native frame payload is an IP packet, mapping the > >> > DSCP in this inner IP packet to the outer IP Header with the default > >> > for that mapping being to copy the DSCP without change. > >> > > >> > I think it is fine to require that implementations are capable of setting > >> > DSCP values on the outer IP header. However, I fail to see any discussion of > >> > the potential issues with actually setting the DSCP values. It is one thing to > >> > do this in an IP back bone use case where one can know and have control > >> > over the PHB that the DSCP values maps to. But otherwise, over general internet the > >> > behavior is not that predictable. One can easily be subject to policers or > >> > remapping. Also as the actual DSCP code point usage is domain specific this is > >> > difficult. Priority reversal is likely the least of the problems that this can > >> > run into over general Internet. > >> > >> It sounds like appropriate discussion and warnings about these issues > >> would resolve the above comment. > > > > For ECN, see RFC 6040 and draft-ietf-tsvwg-rfc6040update-shim. In particular, > > copying the inner ECN codepoint to the outer IP header encapsulation without > > requiring decapsulation processing as specified in RFC 6040 or the 6040update-shim > > draft can lose congestion indications from the network and hence is wrong > > (it's also wrong wrt RFC 3168, but RFC 6040 and the 6040update-shim drafts are > > better and more current references). > > That's a good point. > > > For DSCPs, start with RFC 2983 - thinking about the validity (or likely validity) > > of the outer DSCP at the decapsulator may help in choosing whether to > > recommend a uniform model (e.g., copy DSCP out at ingress, copy back in at > > egress) or a pipe model (e.g., do something reasonable for outer DSCP at > > ingress, ignore it on egress) as the implementation default. > > I believe the default behavior in the current draft is the best > default. That sets DSCP based on the same TRILL Header indicia that > controls default QoS on non-IP links. > > > -- DSCP mapping to/from TRILL/Ethernet priorities > > > >> The intent in the draft is to reflect the default relative priority of > >> the different priority code points in IEEE Std 802.1Q where priority 1 > >> is lower than priority 0. At a quick look, it appears to me that RFC > >> 2474 requires that 0x001000 be handled as being of a priority not > >> lower than the priority with which 0x000000 is handled. Yet RFC 3662, > >> which you point to, seems to suggest using 0x001000 as a lower > >> priority code point than 0x000000. Given that 3662 not only does not > >> update 2474 but is only Informational while 2474 is Standards Track, I > >> would say that 2474 dominates and that this draft makes the best > >> assumptions it can about default behavior... > > > > Well ... that's a discussion about text in RFCs that are well over a decade > > old, and in an area (less-than-best-effort service) where the aspirations > > of at least RFC 3662 weren't realized ... but that RFC is not safe to ignore, > > either. > > > > In practice, the specification of CS1 for less-than-best-effort service has > > been promulgated by RFC 4594 rather than RFC 3662, and RFC 4594 has > > had significant "running code" impact on network design and operation. > > > > As Magnus mentioned RFC7657, I strongly suggest starting from the > > RFC 7657 discussion of this topic in order to figure out what to do. I'm > > not sure what to recommend, but I do think that starting from > > RFC 7657 (rather than RFC 2474 and RFC 3662) is the better approach. > > OK. > > > FWIW, the TSVWG WG is in the process of figuring out which DSCP > > to recommend for less-than-best-effort-service in place of CS1 - that's > > likely to be an active topic of discussion in Prague. > > I'll try to attend that session. > > Thanks, > Donald > =============================== > Donald E. Eastlake 3rd +1-508-333-2270 (cell) > 155 Beaver Street, Milford, MA 01757 USA > d3e3e3@xxxxxxxxx > > > Thanks, --David > > > >> -----Original Message----- > >> From: Tsv-art [mailto:tsv-art-bounces@xxxxxxxx] On Behalf Of Donald > >> Eastlake > >> Sent: Sunday, June 25, 2017 8:07 PM > >> To: Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx> > >> Cc: tsv-art@xxxxxxxx; draft-ietf-trill-over-ip.all@xxxxxxxx; IETF Discussion > >> <ietf@xxxxxxxx>; trill@xxxxxxxx > >> Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 > >> > >> Hi Magnus, > >> > >> Thanks for the extensive review. See my responses below. > >> > >> On Thu, Jun 15, 2017 at 1:32 PM, Magnus Westerlund > >> <magnus.westerlund@xxxxxxxxxxxx> wrote: > >> > > >> > Reviewer: Magnus Westerlund > >> > Review result: Not Ready > >> > > >> > Early review of draft-ietf-trill-over-ip-10 > >> > Reviewer: Magnus Westerlund > >> > Review result: Not Ready > >> > > >> > TSV-ART review comments: > >> > > >> > I have set this to not ready as there are several issues, some significant > that > >> > could affect the protocol realization significantly. Some may be me > missing > >> > things in TRILL, I was not that familiar with it before this review and I have > >> > only tried looking up things, not reading the whole earlier specifications. > So > >> > don't hesitate to push back and provide pointers to things that can > resolve > >> > issues. The authors and the WG clearly have thought about a lot of issues > >> and > >> > dealt with much already. > >> > >> OK. Hopefully we can resolve these one way or the other. > >> > >> > Diffserv usage > >> > -------------- > >> > > >> > Section 4.3: > >> > > >> > TRILL over IP implementations MUST support setting the DSCP value in > >> > the outer IP Header of TRILL packets they send by mapping the TRILL > >> > priority and DEI to the DSCP. They MAY support, for a TRILL Data > >> > packet where the native frame payload is an IP packet, mapping the > >> > DSCP in this inner IP packet to the outer IP Header with the default > >> > for that mapping being to copy the DSCP without change. > >> > > >> > I think it is fine to require that implementations are capable of setting > >> > DSCP values on the outer IP header. However, I fail to see any discussion > of > >> > the potential issues with actually setting the DSCP values. It is one thing > to > >> > do this in an IP back bone use case where one can know and have control > >> over > >> > the PHB that the DSCP values maps to. But otherwise, over general > >> internet the > >> > behavior is not that predictable. One can easily be subject to policers or > >> > remapping. Also as the actual DSCP code point usage is domain specific > this > >> is > >> > difficult. Priority reversal is likely the least of the problems that this can > >> > run into over general Internet. > >> > >> It sounds like appropriate discussion and warnings about these issues > >> would resolve the above comment. > >> > >> > Section 4.3: > >> > > >> > The default TRILL priority and DEI to DSCP mapping, which may be > >> > configured per TRILL over IP port, is an follows. Note that the DEI > >> > value does not affect the default mapping and, to provide a > >> > potentially lower priority service than the default priority 0, > >> > priority 1 is considered lower priority than 0. So the priority > >> > sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7. > >> > > >> > TRILL Priority DEI DSCP Field (Binary/decimal) > >> > -------------- --- ----------------------------- > >> > 0 0/1 001000 / 8 > >> > 1 0/1 000000 / 0 > >> > 2 0/1 010000 / 16 > >> > 3 0/1 011000 / 24 > >> > 4 0/1 100000 / 32 > >> > 5 0/1 101000 / 40 > >> > 6 0/1 110000 / 48 > >> > 7 0/1 111000 / 56 > >> > > >> > This appear to be an problematic mapping. At least for prio 0 and 1. As > >> > priority 1 appears to be intended to be higher than priority 0, it is > >> > interesting that it is mapped to CS1, which to quote > >> > https://datatracker.ietf.org/doc/rfc7657/: > >> > > >> > CS1 ('001000') was subsequently designated as the recommended > >> > codepoint for the Lower Effort (LE) PHB [RFC3662]. > >> > > >> > So what is proposed can in a network using default mapping, result in > that > >> you > >> > get priority 0 to be lower priority than 1. Plus that in some networks this > can > >> > also results in strange remapping that results in a different PHB for CS1 > >> than. > >> > >> The intent in the draft is to reflect the default relative priority of > >> the different priority code points in IEEE Std 802.1Q where priority 1 > >> is lower than priority 0. At a quick look, it appears to me that RFC > >> 2474 requires that 0x001000 be handled as being of a priority not > >> lower than the priority with which 0x000000 is handled. Yet RFC 3662, > >> which you point to, seems to suggest using 0x001000 as a lower > >> priority code point than 0x000000. Given that 3662 not only does not > >> update 2474 but is only Informational while 2474 is Standards Track, I > >> would say that 2474 dominates and that this draft makes the best > >> assumptions it can about default behavior... > >> > >> > MTU and Fragmentation > >> > --------------------- > >> > > >> > I think there are two main issue here. The first one is MTUD discovery > >> > of the actual IP path MTU between the ports. That will be needed to > >> prevent > >> > a lot of traffic going into MTU black holes. Especially as TRILL requries > >> > 1470 byte support which is likey above a lot of paths. > >> > >> Seems like it would depend on the environments where TRILL was used. > >> For example, I do not think 1470 would be a problem in most Data > >> Center or Internet Exchange point uses, for example. Data Centers > >> sometimes support 9K jumbo frames and the like. > >> > >> In fact, it is probably bad to focus too much on 1470 -- that is a > >> required minimum to be sure that reasonable size link state PDUs can > >> be successfully flooded through the TRILL campus so that routing will > >> work. However, it would commonly be the case that, for the TRILL > >> campus to be useful in a particular case, links need to be able to > >> carry the expected size TRILL Data packets. For example, if there were > >> two parts of a TRILL campus connected by one or a few TRILL over IP > >> links and the end stations in each part were assuming they could use > >> 1500 byte Ethernet packets, then the TRILL over IP links would need to > >> support an MTU based on 1500 + TRILL Header + IP and TRILL over IP > >> encapsulation. And more if security was being used or there were any > >> other reasons for additional headers/encapsulation... > >> > >> > Section 8.4: > >> > > >> > Path MTU discovery [RFC4821] should be useful > >> > in determining the IP MTU between a pair of RBridge ports with IP > >> > connectivity. > >> > > >> > The issue with RFC4821 is that it has requirements on the packetization > >> layer. > >> > Trill appears to have several components that are useful. However, it will > >> > require a specification of the procedure to result in a useful tool. > >> > >> See below. > >> > >> > Section 8.4: > >> > > >> > TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in > >> > [RFC7177], can be used to obtain added assurance of the MTU of a > >> > link. > >> > > >> > Yes, that can confirm working MTUs that are at 1470 or above, but > appears > >> > prevented from working below 1470? > >> > >> While there is a minimum size for TRILL IS-IS MTU PDUs, determined by > >> header size, it is well below 1470, probably (depending on whether > >> secuirty is in use, etc.) below 150 bytes. > >> > >> > Thus, it appears that there is a lack of mechanism here to actually get a > valid > >> > and functional MTU from TRILL in the cases where the Path MTU is below > >> 1470. If > >> > I am wrong good, but I think this is an important piece for how to handle > >> the > >> > next main issue. > >> > >> How about referencing Section 3 of > >> https://tools.ietf.org/html/draft-ietf-trill-mtu-negotiation-05 > >> which is currently in IETF Last Call? (The wording of that section is > >> probably going to be improved based on an OPS review by Brian > >> Carpenter.) > >> > >> > UDP encapsulation and IP fragments. > >> ---------------------------------- > >> > I see it as a big issue that UDP encapsulation is the native one, and that > >> > relies on IP fragmentation despite the need for reliable fragmentation. > >> With > >> > the setup of having to support 1470 MTU on TRILL level some packets will > >> be > >> > fragmented in many environments. That will lead to a lot of losses, and as > >> > discussed below a very big problem with middleboxes. The main problem > >> here is > >> > that if one tries to rely on IP fragments one will have issues with packets > >> > ending up in black holes. And different problems depending on IPv4 or > >> IPv6. > >> > IPv6 is lilkely the lesser problem assuming that one have working > PMTUD. > >> > > >> > There are several ways out of this. > >> > > >> > 1. Detect issues and use TCP encapsulation with correctly set MSS to not > >> get IP > >> > fragements 2. Determine MTU and implement an fragmentation > >> mechanism on top of > >> > UDP. > >> > >> So, I don't see that much problem with UDP being the general default > >> consistent with the TRILL philosophy of defaulting to need zero or > >> minimal configuration. The default should be to use multicast Hellos > >> for discovery of neighbors which sure points at UDP to me. Having to > >> traverse a NAT should be a rare case. Since, in the NAT case, you have > >> to configure things related to the static binding and the IP > >> address(es) of peer(s) anyway you can also configure to use a > >> different encapsulation than UDP, such as TCP, at the same time. I > >> don't see it as much of a problem if, by default, TRILL won't operate > >> through a NAT. If you are using UDP and it fragments and fragments are > >> dropped at a NAT, probably you can't exchange Hellos so you will not > >> form an adjacency and anything on the other side of the NAT will not > >> be visible. > >> > >> > Zero Checksum: > >> > -------------- > >> > > >> > Section 5.4: > >> > > >> > UDP Checksum - as specified in [RFC0768] > >> > > >> > Considering the fast path encapsulation desire, I am surprised to not see > >> any > >> > mentioning of use of zero checksum here. Raising the zero checksum and > >> forward > >> > reference would be good I think. > >> > > >> > And then Section 8.5: > >> > > >> > The requirements for the usage of the zero UDP Checksum in a UDP > >> > tunnel protocol are detailed in [RFC6936]. These requirements apply > >> > to the UDP based TRILL over IP encapsulations specified herein > >> > (native and VXLAN), which are applications of UDP tunnel. > >> > > >> > If you actually intended to allow zero checksum, then you actually should > >> > document that Trill fulfills the requirements that the applicability > statement > >> > raises. I have not analyzed how well it meets these requirements. > >> > > >> > Please review Section 6.2 of RFC 8086 for example how that can be done. > >> > >> OK. We'll look into it. > >> > >> > TCP Encapsulation issue > >> > ----------------------- > >> > > >> > Section 5.6: > >> > > >> > The TCP encapsulation appear to be missing an delimiter format allowing > >> each > >> > individual TRILL packet/payload to be read out of the TCP's byte stream. > In > >> > other words, a normal implementation has no way of ensuring that the > TCP > >> > payload starts with the start of a new TRILL payload. Multiple small TRILL > >> > payloads may be included in the same TCP payload, and also only parts > as > >> TCP is > >> > one way of dealing with TRILL packets that are larger than the > >> IP+Encapsulation > >> > MTU that actually will work. > >> > > >> > This comment is based on that there appear to be no length fields > included > >> in > >> > the TRILL header. The most straight forward delimiter is a 2-byte length > >> field > >> > for the TRILL payload to be encapsulated. > >> > >> Right. It might also be useful to include some sort of check field, as > >> is done in BGP, to detect if you are out of sync in parsing the TCP > >> stream. > >> > >> Another point is that, while with UDP it seems fine to send packets > >> with assorted QoS, you don't want to encourage re-ordering of TCP > >> packets in a stream. So if TCP encapsulation is being used, you want > >> to use the same DSCP value for the packets in a particular TCP stream. > >> So, generally, you need to have a TCP connection per priority handling > >> category. Mapping the 8 priority levels into a smaller number of > >> handling categories is a normal thing to do so you certainly don't > >> necessarily need 8 TCP connections. Adding material on this should not > >> be too hard. > >> > >> > Section 5.6: > >> > > >> > TCP endpoint requirements. I do wonder if an application like TRILL actual > >> > would need to discuss performance impacting implementation choices or > >> > limitations. For example use of NAGLE, the requirements on buffer sizes > in > >> > relation to Bandwidth delay products, as buffer memory in a RBridge will > >> impact > >> > performance. > >> > >> Well, I'm not sure how deeply this document should get into such > >> performance issues. What about just saying something about > >> consideration being given to tuning TCP for performance and pointing > >> to one or a few other RFCs that talk about this? > >> > >> > Congestion Control > >> > ------------------ > >> > First thanks for the effort here. > >> > >> You're welcome. > >> > >> > 8.1.2 In Other Environments > >> > > >> > Where UDP based encapsulation headers are used in TRILL over IP in > >> > environments other than those discussed in Section 8.1.1, specific > >> > congestion control mechanisms are commonly needed. However, if the > >> > traffic being carried by the TRILL over IP link is already congestion > >> > controlled and the size and volatility of the TRILL IS-IS link state > >> > database is limited, then specific congestion control may not be > >> > needed. See [RFC8085] Section 3.1.11 for further guidance. > >> > > >> > This is correct, however my question is if the RBridges have any way of > >> knowing > >> > which traffic is actually congestion controlled, considering that TRILL > >> provides > >> > an layer 2 abstraction. I wonder if there should be any type of white list of > >> > the types of layer 2 payloads that can be assumed to be congestion > >> controlled, > >> > and thus okay to forward over IP paths? I am worried that without any > >> > recommendation to prevent traffic that is not controlled to be forwarded, > >> can > >> > lead to congestion issues. > >> > > >> > The other issue I think may exist is the issue serial unicast emulation of > >> > broadcast/multicast creates. As this amplifies the outgoing packet rate > with > >> > a factor of how many addresses are configured for serial unicast this can > >> > be significant traffic expansion. Thus, I think additional considerations are > >> > needed here, and maybe rate limiting of the amount of traffic to be > >> multicasted. > >> > >> OK. We can think about those issues. > >> > >> > Flow and ECMP > >> > ------------- > >> > > >> > Section 8.3: > >> > > >> > For example, for TRILL > >> > Data, this entropy field could be based on some hash of the > >> > Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL. > >> > > >> > I would appreciate clearer references to what these fields are. > >> > >> In a TRILL Data packet, the payload after the TRILL Header looks like > >> an Ethernet frame except that there is always either a VLAN tag or, > >> alternatively, where the VLAN tag would be, a Fine Grained Label > >> [RFC7172]. (The preceding is the view in the TRILL RFCs, but there is > >> an equivalent and equally valid view in which all the fields through > >> and including the VLAN or FGL tag are part of the TRILL Header.) The > >> TRILL base protocol specification focuses on Ethernet as a link > >> technology between TRILL switches, in which case there will be a link > >> header including an Outer.MacDA and Outer.MacSA fields and possibly an > >> Outer.VLAN, all before the TRILL Header. See Figure 1 and Figure 2 in > >> RFC 7172. > >> > >> Some of the above could be added to the draft for clarity. > >> > >> > If I understand this correctly, the idea here is to look into the inner > >> > layer 2 frames, and use the flow equivalents that exists on that level and > >> > hash that into value that maps the flows onto the source port range. > >> > >> Yes. > >> > >> > I think this text should include a summary of the principle and ensure to > >> > note the important requirement that what is considered flows in the > inner > >> > must not result in being striped over multiple source ports as this may > lead > >> to > >> > reordering issues due to packets taking different paths. > >> > >> Well, we can add some text. But when would the relative ordering > >> matter for two TRILL Data packets where the two inner native payloads > >> have different values for any one or more of these three fields > >> (Inner.MacDA, Inner.MacSA, and inner VLAN/FGL tag) ? If any of those > >> fields are different, you are talking about different streams. > >> > >> > NAT and TRILL over IP: > >> > Section 8.5: > >> > > >> > If one like to use TRILL over IP through a NAT, then there are some very > >> > important considerations that are missing. First the need for static > binding > >> > configurations or the need for determining ones external address(es) and > >> be > >> > able to communicate that to the peer RBridges, and in addition ensure > that > >> one > >> > has keep-alives to that the NAT binding never times out. > >> > >> I think those are good points. There is an additional problem that > >> TRILL Hellos detect neighbors with which they have 2-way connectivity > >> by indicating, inside the Hellos that are sent, from what neighbors > >> Hellos have been received on that port. If a NAT is involved, these > >> neighbor addresses inside Hellos need to be mapped. > >> > >> > Next is the issue that there is almost zero chance of getting a IP/UDP > >> > encapsulation TRILL payload through the NAT if it results in IP > >> fragmentation, > >> > as NATs don't do defragment and refragmented on the internal side, and > >> an IP > >> > fragment lacks UDP port and thus can't be matched to binding. > >> > >> So perhaps the recommendation should be to configure the port to use > >> TCP if there will be fragmentation. > >> > >> > Also if you like to run IP/ESP through a NAT, then you most likely need the > >> > IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note > that > >> this > >> > will restrict the MTU even further and thus ensure that the 1470 > >> requirement > >> > cannot be fulfilled even without additional tunnels over an 1500 bytes > MTU > >> > Ethernet infrastructure. > >> > > >> > I would note that also firewalls likely have issues with IP fragments for > the > >> > same reason, they require significant amount of state to be verified if > they > >> > should be let through. > >> > > >> > In general I think you should create a configuration that has chance to > work > >> > through most middleboxes, but I think you should require static bindings. > I > >> > think that configuration is, and don't laugh now, but > >> IP/UDP/ESP/TCP/TRILL, > >> > otherwise you will not be able to have both security and reliable > >> fragmentation > >> > of TRILL packets. > >> > >> OK. Thanks again for this review. It has pointed out a number of > >> problems and in thinking about those, I believe a couple of further > >> problems have come to mind that I mentioned above. We'll work on a > >> revised draft. > >> > >> Thanks, > >> Donald > >> =============================== > >> Donald E. Eastlake 3rd +1-508-333-2270 (cell) > >> 155 Beaver Street, Milford, MA 01757 USA > >> d3e3e3@xxxxxxxxx > >> > >> > Cheers > >> > > >> > Magnus Westerlund > >> > >> _______________________________________________ > >> Tsv-art mailing list > >> Tsv-art@xxxxxxxx > >> https://www.ietf.org/mailman/listinfo/tsv-art