Thanks for your additional comments! Please find our replies inline below.
Best,
/Marco
On 2024-05-13 22:47, Kyle Rose wrote:
On Fri, May 10, 2024 at 9:12 AM Marco Tiloca <marco.tiloca@xxxxx> wrote:
* The security issue outlined in section 13.5 ("Dishonest clients") adequately justifies maintaining confidentiality of the full list of revoked hashes, or at least of recent additions to the list, without reference to privacy as mentioned in section 13.1. The privacy issues posed by hashes of tokens that are not widely distributed or visible to passive observers are not at all clear to me.
==>MT
Quoting the privacy-related sentence in Section 13.1:
> Disclosing any information about revoked access tokens to entities other than the intended registered devices may result in privacy concerns.
Admittedly, it is hard to think of immediate, severe privacy consequences from revealing token hashes and their insertion/deletion in the update collection of non-pertaining parties.
We included that sentence with the intentionally open "may result", as a way to err on the side of caution.
I guess I would say that unless you can articulate a specific privacy implication, stating that there "may result" some privacy concern is more CYA than actually informational to readers or implementors. When this is done broadly, it winds up sounding a bit like the CA prop 65 warnings on literally everything in the sense that it provides no useful guidance on how to evaluate alternatives against each other. So I agree with removing this *unless* you can articulate a specific privacy concern.
==>MT2
Good. Then we confirm to remove that sentence, as already done in [PR].
[PR] https://github.com/ace-wg/ace-revoked-token-notification/pull/10
<==
* Furthermore, doesn't the security issue identified in 13.5 imply that only RSes should be notified of revocations, and clients left to wonder until their requests are denied, or at least until after the RSes to which the token is relevant have been notified? Secrecy matters really only because we want to prevent bad actors with access to the token from taking advantage of it during the window between revocation and RSes being aware of that revocation: if you notify the bad actor proactively, it doesn't really matter that you kept it secret from other nefarious observers. (Note that changing this would require the changes to the recommendation from 13.4.) For what it's worth, this is not a novel problem, and it is one that plagues revocation systems in general. Moreover, all the possible approaches are unsatisfying in some way, often relying on assumptions about the state of mind/intent or expected behavior of the adversarial actor.
==>MT
We did not mean to imply that only RSs should be notified of revoked access tokens, and the intention was not to have Clients left to wonder.
In fact, an access token might be revoked because the RS is found compromised or suspected so. In such a case, informing of the revocation is first of all in the interest of the Client, in order to protect the Client from accessing resources at an RS that is deemed malevolent or not appropriate to access.
It seems like the right way to terminate the ability of a compromised RS to communicate is to revoke its server certificate so all clients will fail (D)TLS handshakes, tokens aside. From a security engineering perspective, this is really the proper solution: revoke the credential that authenticates the server. (The access tokens, by contrast, are really credentials that authenticate and authorize the client.)
But if you decide instead to move ahead with revoking tokens as a proxy for deauthenticating the server, perhaps the guidance should be to leave the suspected compromised party wondering and only proactively notify the other parties to the token. That implies maybe having two classes of TRL: one for clients and one for RSes, and to issue the revocation to one or the other depending on the situation.
All of this is lipstick on a pig, however, as the unavoidable problem with revocation is the polling period of the list. There's a certain degree of "best effort" involved in using revocation of offline credentials to prevent interactions after learning of a compromise. Maybe that's what really should appear under security considerations: a statement of the inherent limitations of revocation lists and how that should be taken into consideration when designing systems that leverage these tokens for authentication and authorization.
==>MT2
Please note that there are different possible reasons for revoking an access token, many of which are not related to a registered device being compromised or suspected so. Examples of those reasons are listed in the second paragraph of Section 1.0 "Introduction". In such cases, there are no "bad actors".
Also related to this, the GENART review archived at [GENART-REVIEW] noted that
> ... the process(es) by which a token is declared revoked, and the method by which the AS is notified of that (and consequently updates the TRL), is out of scope. That fact is implicit in this document, but stating it ensures someone doesn't hunt through this document looking for a specification of the revocation process.
We already addressed that point by adding the following text, as a new paragraph at the end of Section 1.0 "Introduction":
> The process by which access tokens are declared revoked is out of the scope of this document. It is also out of scope the method by which the AS determines or is notified of revoked access tokens, according to which the AS consequently updates the TRL as specified in this document.
That is, upon the revocation of an access token, the AS might not know that a registered device to which the access token pertains has been specifically compromised or misbehaving.
Even if the registered device is indeed compromised and the AS was aware of that, the AS is unlikely to be in the position to specifically revoke the long-term authentication credential of that registered device, and thus to prevent from interacting with it altogether. If deemed appropriate, that's definitely something important and fitting to do for the issuer of such authentication credential. (Side note: in its asymmetric mode of operation, the DTLS profile of ACE specified in RFC 9202 considers only raw public keys, but not full-fledged certificates)
What the AS is certainly and autonomously supposed to do is to officially declare an access token as revoked (irrespective of the specific reason), update its TRL accordingly, and allow the devices to which the access token pertains to learn about that, through the method defined in this document. Until a new access token is issued, this results in terminating the associated secure communication association between the Client and Resource Server for which the access token was issued, and prevents the Client from accessing protected resources at the Resource Server.
About the polling period on the TRL, there is certainly a trade-off between that period and the ability to stay aligned with pertaining access tokens that have been revoked. The (additional) use of CoAP Observe (RFC 7641) as a subscription mechanism helps in this respect, and the security considerations in Section 13.3 "Communication Patterns" do recommend about not relying solely on that, but also on an appropriately tuned polling interval.
That said, this discussion has made it clear that the document was missing further security considerations on what the TRL alone does *not* provide. That is:
* From the TRL, the registered devices learn that a pertaining token has been revoked, but not the reason why, and not if that reason is a compromise, misbehavior, or decommissioning.
* In the particular case where a registered device is compromised, misbehaving, or decommissioned, it might not be enough to only revoke its pertaining access tokens. That is, the entity that authoritatively declares a registered device to be compromised, misbehaving, or decommissioned should also promptly trigger the execution of additional revocation processes as deemed appropriate. These include, for instance:
- De-registering the registered device from the AS, so that the AS does not issue further access tokens pertaining to that device.
- If applicable, revoking the public authentication credential (e.g., the public key certificate) associated with the registered device.
The methods by which these processes are triggered and carried out are out of the scope of this document.
Within Section 13 "Security Considerations", we have now added a new subsection "Additional Security Measures" discussing the limitations and additional expected actions above. This is captured in the commit at [COMMIT].
[GENART-REVIEW] https://mailarchive.ietf.org/arch/msg/ace/ETtaBMaSyoZKMD82kgG49P2cF9U/
[COMMIT] https://github.com/ace-wg/ace-revoked-token-notification/pull/10/commits/a64db82a2d000cdcc365406abde658062fa87083
<==
NEW (emphasis mine):
> This can be due to different reasons. For example, the access token has actually been revoked and the Client is not aware about that yet, while the RS has gained knowledge about that and has expunged the access token. **As another example, the access token is still valid, but an on-path active adversary might have injected a forged 4.01 (Unauthorized) response, or the RS might have deleted the access token from its local storage due to its dedicated storage space being all consumed.**
How can an on-path active adversary inject forged messages into communication between two endpoints? I admit to having very little knowledge of ACE: is the communication not end-to-end integrity protected with server authentication, a la DTLS? (If that's not the case, then frankly all bets are off.)
==>MT2
Citing also the immediately previous paragraph from the same Section 13.4 (emphasis mine):
> If a Client stores an access token that it still believes to be valid, and it accordingly attempts to access a protected resource at the RS, the Client might anyway receive an **unprotected** 4.01 (Unauthorized) response from the RS.
Thinking of the attack-free case first, the RS may have deleted the access token due to memory limitations, after which the RS terminates its secure association with the Client. Depending on the specifically used secure communication protocol, the Client might not be aware of that termination.
When later sending a protected request to access a resource at the RS per the still valid access token, the RS will reply with an unprotected 4.01 (Unauthorized) response, which may specifically be used to convey an "AS Request Creation Hints". Due to the above, the RS has in fact no means to protect that response anyway. (Please refer to Sections 5.10.1.1, 5.10.2, 6.4, and 6.8 of RFC 9200 for further details)
If instead the RS is still storing an access token and it still shares an active secure communication association with the Client, an adversary can block the protected request from the Client, inject an unprotected 4.01 (Unauthorized) response in reply to the Client, and thus make the Client believe that the RS is not storing the access token anymore.
<==
1. The AS issues an access token TOKEN with a lifetime of X seconds. Instead of the 'exp' claim specifying an expiration time, TOKEN includes the 'exi' claim with value X (see Section 5.10.3 of RFC 9200). Then, the AS provides C with TOKEN.
Aha, so "exi" is the source of the problem. If that weren't permitted by spec, and explicit deterministic timestamps were required for expiration, you would eliminate this entire class of problem. I suggest doing an RFC 9200bis to remove things like this, and maybe get the entire ecosystem reviewed by security experts to avoid basic but preventable architecture flaws that unnecessarily complicate the security story.
==>MT2
Well, 'exi' certainly made it more difficult to design what is specified in Section 10.1 of this document :-)
At the same time, the introduction of 'exi' in the ACE framework was motivated by the need to "support token expiration for devices that have no reliable way of synchronizing their internal clocks" (see Section 5.10.3 of RFC 9200), and thus cannot afford using 'exp' anyway.
Please note that a dedicated handling of 'exi' was defined in Section 5.10.3 of RFC 9200, and its limitations/drawbacks were documented in the security considerations compiled in Section 6.6 of RFC 9200.
<==
<==
Other comments: * I did not review the properties of, or analyze the correctness of, the database consistency algorithm and associated update protocol used to keep registered devices up-to-date with relevant token hashes. I do not know if this algorithm and protocol were based on something specific, so if that is not the case then my main observation would be that the consistency requirements here are not unique to the proposed revocation function, so it may be worth reviewing literature relevant to the problem space associated with database view consistency across distributed and intermittently-connected devices to see if there is something more generic that can be leveraged in solving this problem.
==>MT
We separately reply about the two different aspects raised in the comment.
**Database consistency algorithm** - This is an important component of an AS that relies specifically on a database.
I'm using the term "database" abstractly. In the simple case, this represents what every node in the network would see if they were synchronously accessing the same single copy of the data, which is by definition always consistent with itself.
By contrast, in the case described in this document, the database (the TRL of unexpired tokens) is distributed into instances (the RSes) with updates flowing to it at different times (via TRL polling), which means you'd ideally like some kind of consistency model (e.g., sequential consistency, serializability, etc.) to allow you to reason about what two nodes acting independently might see when querying their local instances at a particular time.
This is one of the foundational problems in distributed systems. There's no reason to reinvent the wheel here. My recommendation is that if there is a model that does what you want in the distributed systems literature, you should just implement that. But deriving benefit from using a standard database consistency model might also depend on eliminating things like the aforementioned relative expiry "exi", which complicates not the database update but a node's behavior at the moment a token is received.
==>MT2
The TRL is a resource hosted only at the AS, and is not distributed into instances. In particular:
* Per Section 4, the TRL is a single data structure hosted only at the AS.
* Per Section 4.1, the AS is the only writer of the TRL, as the only authoritative responsible of the information in the TRL. This is consistent with the AS being the only issuer of the access tokens under consideration.
* Per Sections 5-8, the registered devices and the administrators are readers of the TRL.
The retrieval of information from the TRL relies on RESTful interactions with the AS, specifically through a request with the idempotent and safe method GET, sent to the TRL endpoint at the AS. The corresponding response from the AS conveys information that is consistent with the same current representation of the TRL at the AS at that time.
Until obtaining a next response from the AS, a node does not really "query its local instance", but instead sees what it has been storing since receiving the latest response from the AS.
When a node additionally uses CoAP Observe (RFC 7641) as a subscription mechanism, the Observe notification responses from the AS provide "eventual consistency" to that node (see Section 1.3 of RFC 7641), again with respect to the only authoritative representation of the TRL at the AS.
* Accessing the TRL at the AS is the only way for the readers to obtain that information.
That is, a registered device or administrator does not obtain that information by accessing a different endpoint at another registered device or administrator.
Any two nodes that query the TRL at the same time obtain a response that is consistent with the same, current representation of the TRL at the AS at that time. Therefore, there seems to be no particular issue to address.
Certainly, two nodes might obtain different responses built on different versions of the TRL at the AS, if they query the TRL at different times, as the AS updates the TRL.
However, the impact of a node having an outdated version of the TRL is limited to that specific node that has not yet queried the TRL since the last TRL update at the AS.
Even though each reader thinks for itself and based on its current view of the TRL, there is clearly a trade off between the query rate used by that node and the chance for that node to have a current outdated view. In this respect:
* The (additional) use of CoAP Observe helps, and the security considerations in Section 13.3 "Communication Patterns" recommend about not relying solely on that, but also on an appropriately tuned polling interval.
* As a result of addressing the OPSDIR review archived at [OPSDIR-REVIEW], we have also extended Section 10.0 "Notification of Revoked Access Tokens" with additional text that concludes with
> In order to limit the amount of time during which the requester is unaware of pertaining access tokens that have been revoked but are not expired yet, a requester SHOULD NOT rely solely on diff query requests. In particular, a requester SHOULD also regularly send a full query request to the TRL endpoint according to a related application policy.
[OPSDIR-REVIEW] https://mailarchive.ietf.org/arch/msg/ace/ElqlgO6FHPsjoqw7L3gKkbSqVUo/
<==
Kyle
-- Marco Tiloca Ph.D., Senior Researcher Phone: +46 (0)70 60 46 501 RISE Research Institutes of Sweden AB Box 1263 164 29 Kista (Sweden) Division: Digital Systems Department: Computer Science Unit: Cybersecurity https://www.ri.se
Attachment:
OpenPGP_0xEE2664B40E58DA43.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature
-- last-call mailing list -- last-call@xxxxxxxx To unsubscribe send an email to last-call-leave@xxxxxxxx