Review of draft-ietf-hokey-erx-09 I have reviewed this document as part of the Operations and Management directorate effort. These comments were primarily written for the benefit of the O&M area directors. Document editors and WG chairs should treat these comments just like any other last call comments. Detailed review comments are available here: http://www.drizzle.com/~aboba/EAP/erx-review.txt An answer to typical O&M issues is included below: 1. Is the specification complete? Can multiple interoperable implementations be built based on the specification? There are a few areas of the document which are unclear to me, such as how AAA routing is accomplished, and how/when peers require the local realm, and if so, how it is to be obtained. Also, clarity with respect to algorithm agility could be improved. There are also some issues with respect to the required behavior of ERX peers and severs (use of normative language). There are also situations in which multiple approaches can be chosen (such as the various bootstrap options), without one being chosen as mandatory or default. Choosing one approach would seem to be better. In my judgement, addressing these issues would improve the likelihood of being able to build multiple interoperable implementations. 2. Is the proposed specification deployable? If not, how could it be improved? Based on my reading of the document, it would appear that the ERX proposal requires changes to EAP peers, authenticators and servers, as well as RADIUS clients, proxies and servers. It also appears possible that changes to the lower layer protocols will be required in at least some cases, such as to make the local domain available to the peer. Given my experience in designing and operating wireless networks, deployments requiring changes only to peers and authenticators (but not servers or RADIUS infrastructure) can take as long as 3-5 years to complete. For example, WPA2 is still not universally deployed, even though the specification was finished in 2004. By also requiring changes to AAA infrastructure, it seems to me that ERP deployment will be made more difficult than upgrades to the lower layer (such as IEEE 802.11r), which appear to achieve a similar objective. This puts the ERX proposal at a competitive disadvantage, and makes it unlikely that it will be widely deployed in its current form. 3. Does the proposed approach have any scaling issues that could affect usability for large scale operation? The proposed approach introduces state into NASes, as well as RADIUS proxies and servers. This state is typically of two types: routing state and key state. In terms of key state storage, it would appear that the RADIUS server needs to store key state for each authenticated user within the Session-Id lifetime, regardless of where they are located. Local ERX servers store state for all local users, regardless of their home realms. In order to scale to handle a large user population, additional RADIUS servers are typically deployed, going against a replicated backend store (such as an LDAP directory). Similarly, additional RADIUS proxies are deployed to handle the forwarding load. In conventional RADIUS deployments, proxies act much like routers, so that the failure of a RADIUS proxy will not necessarily result in failure of an EAP authentication in progress. For example, a NAS could switch over from use of one proxy to another one and as long as the same RADIUS server remained reachable, the conversation could complete normally. Similarly, while failure of a RADIUS server during a conversation will require re-starting the EAP conversation, that conversation could complete normally if restrated with a new server, since all servers presumably have access to the same backend credential store. Some of these assumptions no longer apply with ERX, since RADIUS proxies and servers now store key state which is not replicated between them. Therefore RADIUS failover would disrupt the functioning of ERX in a way that it does not disrupt operation of RADIUS today. For example, if a RADIUS proxy or server goes down, all key state at that proxy/server may be lost (the document does not talk about use of stable storage to preserve keys), and therefore ERP requests will fail. With respect to the resource requirements required to store key state, I believe that they are manageable for the most part. Typically RADIUS servers have substantial resources associated with them, so that they are more capable of handling this kind of state than NASes which are embedded devices. In terms of NAS state, it would appear to me that the proposed approach scales better than existing proposals such as IEEE 802.11r, since an authenticator will only hold state for connected devices, as opposed to devices that *might* connect in the future. My only concern would be about RADIUS proxies. In my experience, proxies are often installed in co-location facilities where repairs can be expensive and difficult, and so they are often installed on stripped-down hardware; with the current move toward flash, they may not even have a hard disk in the near future. Such stripped down boxes may not be capable of maintaining large key caches. 4. Are there any backward compatibility issues? There seem to be some issues with respect to backward compatibility with EAP as defined in RFC 3748 and RFC 4137. For example, the document appears to enable two packets to be in flight at the same time, and there seems to be an assumption that ERP implementations will not respond to EAP-Request/Identity packets. A bigger problem may exist with respect to RFC 2284 implementations which represent the bulk of existing EAP deployments. Since RFC 2284 does not specify how peers and servers behave when encountering new EAP message types or peer-initiated messages, the behavior in the field will be implementation dependent. Hopefully, this does not include unanticipated ill effects (crashes, security compromises) but it's not possible to rule this out without testing. There also may be issues with respect to compatibility with existing EAP lower layers. For example, it would appear to me that IEEE 802.1X-2001 (which represents the bulk of existing 802.1X deployments) does not support peer-initiated messages. In order to minimize the backward compatibility issues, it probably makes sense for the peer not to utilize ERP unless it has an indication that it is supported on a given network and AAA server (e.g. based on pre-configuration). Currently the document does not require this. Sections of the document relating to AAA packet routing are somewhat unclear, and may introduce changes to the way that RADIUS clients route packets. However, discussion of AAA routing seems somewhat orthogonal to the purpose of this document, so one way forward would be to move this material to the RADIUS ERP document instead. 5. Do you anticipate any manageability issues with the specification? In today's carrier deployments, we are seeing the need for the facilities such as "Hotlining", which require the ability to modify authorizations or remove key state created by a user session. RFC 5137 typically uses the User-Name as the key which the NAS uses in order to locate the state which is to be affected. However, ERP introduces state within the local ERX server as well as on the NAS, and it is not clear how this state can be removed. For example, the local ERX server may not have access to the actual User-Name, since this could be hidden within the EAP conversation. As a result, I think that there is an implication that a user identifier such as the CUI is used to identify key state on the ERX server; however, this is not stated. 6. Does the specification introduce new potential security risks or avenues for fraud? One of the issues introduced by "fast handoff" specifications that bypass the AAA server is that this can result in accounting packets being sent without corresponding evidence of user presence. For example, when the user is required to authenticate at each authenticator, the home server has evidence that the user was in fact present at those locations and times, even though the session times could be inflated. With ERP, it is required for the user to authenticate once within the local domain, and then for it to remain there until the keys expire. This could involve a continuous session, or the user could go to another domain and come back without having to re-authenticate. To some extent, the risk can be controlled by the home server administrator by changing the key lifetime so as to require re-authentication within a given time frame. However, the document does not describe how rIK key lifetime will relate to other lifetimes such as the Session-Id in order to accomplish this. A more serious issue appears to arise in the "implicit boostrap" exchange, where the DSRK request is inserted by the local ERX server in a normal EAP conversation. As specified in the document, the AAA server does not appear to have the ability to verify this request. For example, there is no requirement that the "local domain" correspond to the domain that would be returned from a PTR RR query on the NAS-IP-Address. This would seem to imply that any intermediate proxy can obtain a DSRK, and with it, the ability to submit unverifiable accounting records. This would seem to introduce a fraud risk that is not present in existing fast handoff proposals. |
_______________________________________________ Ietf@xxxxxxxx http://www.ietf.org/mailman/listinfo/ietf