Hi Bob, following my post to the SUIT mailing for feedback here is the proposal for an updated draft to address your comments: https://github.com/suit-wg/architecture/pull/12 Here is the link to the new document: https://github.com/hannestschofenig/architecture/blob/patch-4/draft-ietf-suit-architecture.txt Ciao Hannes -----Original Message----- From: Bob Briscoe <ietf@xxxxxxxxxxxxxx> Sent: Tuesday, August 11, 2020 2:51 PM To: Bob Briscoe <ietf@xxxxxxxxxxxxxx>; tsv-art@xxxxxxxx Cc: last-call@xxxxxxxx; draft-ietf-suit-architecture.all@xxxxxxxx; suit@xxxxxxxx Subject: Re: [Tsv-art] Tsvart last call review of draft-ietf-suit-architecture-11 I notice that the review upload process has munged some of the line wrap. I've re-instated it below. On 10/08/2020 00:33, Bob Briscoe via Datatracker wrote: > Reviewer: Bob Briscoe > Review result: Ready with Issues > > This document has been reviewed as part of the transport area review > team's ongoing effort to review key IETF documents. These comments > were written primarily for the transport area directors, but are > copied to the document's authors and WG to allow them to address any > issues raised and also to the IETF discussion list for information. > > When done at the time of IETF Last Call, the authors should consider > this review as part of the last-call comments they receive. Please > always CC tsv-art@xxxxxxxx if you reply to or forward this review. > > This review is long. For the benefit of busy readers, it is structured > with 7 important issues listed first (and tagged either as technical > or editorial), followed by minor editorial comments for the authors. > > Altho' it is ostensibly from the Transport Area Review Team, this > review identifies only one transport-related issue (see item #6a). > Most of the major discussion points are offered with a security hat on. > > First I want to say that there's a lot of useful stuff in the draft. > So I'd like to apologize that the review comments raise issues, and do > not dwell on praising all the good stuff. > > == Important Issues == > > 1. Motivation for publication by the IETF [Editorial] > > Until I reached the summary of the recent IoT IAB workshop in the > first para of the Security Considerations section, I was wondering why > the IETF needed to publish this. It seemed to be a description of what > is already done in the industry, but framed as an architecture. Most > of this first para of the Security Considerations section motivates > this work, and ought to be moved to the Introduction. > > Even then, a document that describes what the industry already does > isn't a sufficient response to a security problem. Given (I believe) > the intention is to encourage the industry to systematically cater for > firmware updates, perhaps the draft needs to be a little more > hard-hitting (without being patronizing of course). Rather than giving > the impression (except in the abstract) that it is just describing > current industry practice. For instance, see item #2 below about > saying what not to do. I would also suggest that it should highlight > the simplest architecture, only giving optional more complex extras > later (see item > #4 below). > > 2. Is Anything Not Allowed by this Architecture? [Technical+Editorial] > > a) A good architecture precludes as well as includes. Would it be > useful to list some common practices that are insecure, and perhaps > some common misconceptions about secure firmware update? > > b) I could hardly find anything in this draft that did not equally > apply to firmware update of "Non-Things". It would indeed be useful to define a 'Thing' > (at least what this document means by it). I suggest: > * unattended operation > * not within the operator's physical security control > > c) On the subject of ruling things out, I felt the list of items ruled > out of scope in the Security Considerations include some items that > are so central to IoT that they should not have been ruled out of > scope, and in the first two cases quoted below, they didn't need to be > ruled out of scope, because the document addresses them: > " > - installing firmware updates in a robust fashion so that the update > does not break the device functionality of the environment this > device operates in. > - the distribution of the actual firmware update, potentially in an > efficient manner to a large number of devices without human > involvement > - energy efficiency and battery lifetime considerations. > " > And, wouldn't it be better to move scoping statements to just after > the Intro, rather than in Security Considerations? (And, yes, I know > that not all Things are energy-challenged, but the size of the subset > that are is significant.) > > 3. Relying on Software with Security Vulnerabilities to Patch Security > Vulnerabilities [Technical] > > The Intro only mentions 'software updates' generally, and doesn't > explicitly mention patching security vulnerabilities (altho the > abstract does). Only having read the Security Considerations section, > do I discover that the draft is primarily meant to be about patching firmware vulnerabilities. > > That raises the question of how secure it is to download new firmware > from a device booted from firmware that is potentially already > compromised. As a minimum, surely the draft needs to mention this point. And preferably: > * whether anything can be trusted once firmware is compromised, and if so what. > * whether it is still worth updating firmware, even once a > vulnerability in the firmware update process has been identified, given: > o identification of a vulnerability does not necessarily imply it has been > exploited, or not prevalently exploited > o a vulnerability might not make the firmware update process itself > vulnerable (with an explanation of how to tell) > * describe which aspects of the firmware update process need to be run > within a TEE (and which not if any) > * should the TEE lock the device against booting if a firmware > authentication or integrity check fails > o how to prevent tampering with firmware integrity from itself being used as > an attack, e.g. > - by ensuring that, once a device is locked against booting, firmware > re-update is never completely disabled > - by ensuring firmware updates are not immediately retried without an > exponentially increasing timer back-off, otherwise retries could lead to > the devices flooding their own network with fruitless update traffic. > > 4. Please Focus More on the Simplest Architecture [Technical] > > All the following increase system complexity, but are not /essential/ > for strong security: > a) Status Tracking Per Device > b) Confidentiality of the firmware binary > c) Robustness against rendering the device unbootable > d) Supporting both Message Authentication and Object Authentication (see item > #5) > e) Broadcast Friendly (see item #6) > > This draft is meant to be persuading the 'industry of Things' to > provide built-in secure firmware update. It tends to fall into the > common trap of setting the security bar so high that practitioners might give up in despair. > > a) Per-device status tracking certainly might be preferred by many > operators, but the alternative of the operator not knowing the status > of each individual device might be acceptable (as in the example in > Figure 5). Per-device status tracking introduces the following complexity: > * a need to separately identify each device, both on each device, and > in the status tracker. > * a need to securely identify each separate device (to prevent > compromised devices masquerading as all the other devices to give a > false sense of security), requiring management of separate public or > shared keys > > b) Confidentiality certainly might provide defence in depth against > reverse engineering the binaries, but it is ultimately security by > obscurity, and so ultimately optional. By definition (see item #2b) > 'Things' are not in a physically secure environment. So, unless all > devices decrypt all downloaded binaries within a TEE and store them in > tamper-proof memory, once the binaries are stored on each device, they > will be accessible to external inspection anyway. So the document > should be less dogmatic about confidentiality protection (3rd para of > Intro), and at least explain that, with IoT, confidentiality on the > wire is moot unless there is also confidential device storage as well. > > c) Robustness against rendering the device unbootable Often, when I > initiate an (attended) firmware update, the OS warns me that this is a > sensitive process that could render the device useless if the power > fails part-way through. So clearly, this is a cost-tradeoff that > device designers are willing to compromise on. Therefore, I don't > think the IETF is entitled to pronounce a requirement against this > practice. I would rather see this text moved from Requirements to > somewhere else in the doc, as a commentary on the implementation > issues, rather than stating it as a requirement. Climbing down a bit at the end by saying it is only an implementation requirement doesn't help. > > 5. Both Message Authentication and Whole Object Authentication? > [Technical] > > Message authentication codes aren't specifically mentioned, until > sections 7 & 8, where they are mentioned as if they might be used, > without saying why or how. The document needs to discuss the merits of > MACs vs. authentication of the whole manifest and/or the whole firmware binary. > > Ultimately, if an object's authenticity and integrity will be verified > once it is fully delivered, there is no need for MACs as well. > However, using message authentication reduces the risk that the device > is talking with an imposter at an early stage in the transmission, > rather than having to wait until it is complete. And it is easy to > arrange message authentication to cumulatively authenticate the whole > object, without additional infrastructure for whole-object > verification. Therefore using MACs could avoid the need to provide > enough storage for a complete update of the firmware as well as the > current version - after verifying the manifest and the first message, the device could even start to overwrite the firmware it is currently booted from. > > The above strategy would not be without risk, but my point is not just > to suggest this particular strategy. The document ought to at least > discuss the trade-offs between MACs and whole-objection > authentication, and whether both are really necessary. > > 6. Friendly to Broadcast Delivery? [Technical] > > Section 3. states this as one of the "Requirements", although the text > softens it to "may be desirable for some networks". However, broadcast > delivery introduces the three significant problems below, wrt a) > reliable transport; b) device energy efficiency; and c) broadcast message authentication. > > a) Reliable Broadcast Transport > Delivery of binary objects needs to recover lost or corrupt packets. > Reliable broadcast delivery at scale is extremely challenging. It > needs either fountain coding [1] or reliable multicast. > * Fountain coding delivers an object in a continually repeating stream > and ensures that the data in any missing packet can be reconstructed > from data in a subsequent different packet. But this would increase device complexity. > * For broadcast delivery, per-packet acknowledgements (ACKs) from each > device do not scale. Negative ACKs (NACKs) can be used but they also > do not scale. If a loss is experienced close to the root of the > broadcast/multicast, it still causes an implosion of negative ACKs > (NACKs) on the sender. Reliable multicast (e.g. PGM [RFC3208]) > arranges a spreading tree of delivery nodes each of which handles > NACKs solely from its next-degree downstream neighbours. Clearly this increases network or CDN complexity. > > b) Broadcast Energy Efficiency > If the IoT device is wireless and needs to take care with its energy > consumption, it will need to initiate all communications, rather than > have to sit with its radio powered up listening for an incoming > message. However, of course, it is not possible for each device to > independently initiate an incoming broadcast. It would be possible for > a broadcast to be scheduled, and for each device to poll for the > schedule. But this would add complexity, particularly because all the > device clocks would have to be fairly closely synchronized. > > c) Broadcast Message Authentication > Message authentication has potential advantages over whole-object > authentication (see #5). When MACs are used over unicast, typically > the cost of asymmetric crypto for each message is avoided by using > asymmetric crypto just once to transmit a shared key, which is then > used to verify each MAC. However, that process is only secure for > unicast. For broadcast or multicast delivery, the sender only sends > each message once, using one key for the MAC that would therefore have > to be shared with every receiver. Then any receiver could masquerade > as the genuine sender. TESLA is a solution to this [RFC4082], but it > would again increase the complexity of each device and the servers, > not least because it requires loose clock synch (nonetheless, uTESLA has been implemented for challenged devices [2]). > > Aside regarding broadcast encryption: > In section 3.3. "Use state-of-the-art security mechanisms", it says: > "The information that is encrypted individually for each device must > maintain friendliness to Content Distribution Networks, bulk storage, > and broadcast protocols." > That implies a magic encyption scheme that is beyond any > state-of-the-art that I am aware of! If information is encrypted > individually for each device, surely by definition it will not be > friendly to broadcast protocols. Actually, I suspect the authors did > not mean to say "encrypted individually for each device", because a > shared group key is adequate for confidentiality - a shared group key is only problematic for message or source authentication (see above). > > 7. Missing Security Concerns [Technical] > > a) Avoiding Reliance on the Device's System Clock > > I suggest that the document makes the point that it is preferable for > the firmware update process not to rely on the device's system clock. > > Reasoning: Even if the TEE maintains the system clock, protection > against attacks on this clock rely on voting between multiple time > sources. No amount of authentication provides any proof of message > timing. So, it is hard for a TEE to protect against tampering with the > timing of its messages, given they pass via the untrusted execution > environment of the rest of the device, similar to the problem of a secure time source for virtualized functions [3]. > > I think IoT developers can be reassured that none of the requirements > for firmware update need to rely on the system clock. For instance > roll-back attack prevention (section 3.4) only requires comparison > between version numbers, not comparison between a release time and the clock. > > However, I think not relying on the clock is worth mentioning, because > key expiry and key revocation have to be designed carefully to avoid > relying on secure time, and this is a subtle point that might not be > appreciated by IoT device designers. > > b) Key revocation > > When keys are in tamper-resistant storage but otherwise not within a > physically secure site, the question of revocation surely has to be > addressed. In particular, there should be a discussion about the > advisability or otherwise of pre-loading the same keys into multiple devices. > > == Minor Editorial Issues == > > 1. Intro > "Updates to the firmware of an IoT device are done to fix bugs in software..." > This would be a good place to highlight the focus on patching security > vulnerabilities. > > "This version of the document assumes... Future versions may also describe..." > I assume this aspiration needs to be deleted now? > > 2. Terminology > > There are ~22 occurrences of lower case 'must' in this document, and > one 'should' (excluding multiple uses in rhetorical questions). I'm > not sure whether it is intentional to make it seem like this is an RFC > that is mandating behaviour, perhaps for readers who don't understand > the subtleties of the IETF informational track. I would prefer it to > be clear that this document is not mandating anything, by using > alternatives to 'must' like 'ought to' or 'has to'. Otherwise it could be considered disingenuous. > > "The term ’system on chip (SoC)’ is often used for these types of devices." > Perhaps more useful: > "The term ’system on chip (SoC)’ is often used interchangeably with MCU, but > MCU tends to imply more limited peripheral functions." > > "The following entities are used:" > The list is a mix of stakeholders and functions, which tends to show > that the authors themselves might not be clear about the distinction. > It would be useful to split into two lists. > > "The terms device and > firmware consumer are used interchangeably since the firmware > consumer is one software component running on an MCU on the > device." > I didn't notice them being used interchangeably. If they are anywhere, > why not just edit to use whichever term is more appropriate and delete this sentence? > > Status Tracker > "While the IoT device itself runs the client- > side of the status tracker it will most likely not run a status > tracker itself unless it acts as a proxy for other IoT devices in > a protocol translation or edge computing device node." > The client-side of a status tracker surely does run a status tracker > itself (the clue is in the name). I know what is intended, but the > writer was clearly in two minds as to whether a status tracker is the > combination of client and server or just the server. > > 3. Requirements > > 3.5 "High reliability" -> 'Robust against becoming unbootable'. > The title for this requirement otherwise implies a much more general > requirement than the description under it. > > 3.6 Small bootloader > "...again using firmware updates over serial, USB or even wireless > connectivity like a limited version of Bluetooth Smart." > Don't see why it has to be "...a limited version of...". Suggest these > words are deleted. > > s/poses a risk in reliability/ > /poses a reliability risk/ > > s/must fit in the available RAM/ > /must fit in the available memory/ > (not necessarily RAM) > > s|there are not other task/processing running| > |there are not other tasks/processes running| > > s/unlike it may be the case/ > /unlike that which may be the case/ > > s/Note: This is an implementation requirement./ > /Note: This last paragraph is an implementation requirement./ > (Otherwise, 'this' could ambiguously refer to the whole requirement) > > 3.7 Small Parsers > "Since parsers are known sources of bugs they must be minimal." > To be honest, I suspect the target audience will find this sentence > and others like it rather pious. Given the purpose of this document is > meant to be to encourage implementers to provide secure firmware > update, I think these peripheral "requirements" will just serve to > make any implementers reading this feel they are being patronized. > > As with the earlier requirement about 'robustness against becoming > unbootable', I think many of these 'requirements' would be easier to > stomach within a discussion of tradeoffs, rather than as a list of > pronouncements that demand perfection. > > 3.8 > s/Minimal impact on existing firmware formats/ > /No impact on existing firmware formats/ > Reason: This is what the text underneath says. > > 3.9 Robust permissions > > "...the authorization policy is separated from the > underlying communication architecture. This is accomplished by > separating the entities from their permissions." > I'm not sure whether either of these sentences makes much sense (at > least not to me). Perhaps the first sentence means to say that > "...the authorization policy is separated from the > firmware it applies to" > And then the second sentence could be deleted. I'm not sure the second > sentence would ever be necessary, because entities are always separate > from their permissions (otherwise you would have to access an entity > to find out you weren't allowed to access it). To be honest, I don't > really see the point of the whole requirement. So if it is important, > maybe its meaning needs to be clarified for people like me. Otherwise, > if it's just stating the obvious, maybe it's not necessary at all. > > 3.10. Operating modes > Later, in S.5. the term 'delivery modes' is used. If these are meant > to mean the same thing, then the same term should be used > consistently. In my experience, the term 'interaction model' is used > to describe things like polled request-reply, push, publish-subscribe, etc. > > "The pre-authorisation step involves verifying..." > When describing a distributed system, pls avoid passive sentences like > this, which don't specify which entity is performing the action. It is > followed up later by "...the firmware consumer must also...", which > implies the subject is the firmware consumer, but it's best not to > rely on implication, especially not if it requires two passes to understand. > > "Pushing a manifest and firmware image to the transfer to > the Package resource of the LwM2M Firmware Update object" > Garbled? > > "...it may need to wait for a trigger from the > status tracker to initiate the installation, may trigger the update > automatically, or may go through a more complex decision making > process to determine the appropriate timing for an update" > I had to read this a few times before realizing it was a list. > How about: > "... to initiate the installation, it may either need to wait for a trigger > from the status tracker; or trigger the update automatically; or go through a > more complex decision making process to determine the appropriate timing for > an update" > > 3.11. > s/Suitability to software and personalization data/ > /Suitability for software and personalization data/ > > The document suddenly jumps into a different style at the start of > 3.11, more like an log of WG activity than a requirement. Pls consider > making the style consistent, especially given it switches back after > the first sentence of the 2nd para. > > 4. Claims > s/Only install firmware with a matching vendor/ > /Only install firmware with a matching author/ ? > > 5. Communication Architecture > > The document often repeats that it's agnostic to the communication > architecture, then this section starts with the phrase: > "Figure 1 shows the communication architecture..." > Perhaps it means 'firmware update architecture'? > Or, possibly this implies that the authors are in two minds as to what > 'communications architecture' means. Or the heading was intended to be > 'Communications Architectures' (plural) and the first phrase was meant to say > "Figure 1 shows an example communication architecture..." > > The text needs to make it clear that a status tracker is optional in > the client pull case but not in the server push case (see item #4a earlier). > > It would be useful for the doc to say what it means for an operator > circle to enclose a function. For instance the 'Device Operator' in > Fig 1 encloses the status tracker, which to me implies it controls the > status tracker. However, the network operator encloses the device, > which probably doesn't imply it operates the device. Perhaps an > enclosing circle means 'within the physical security control of'? The > network operator isn't mentioned in the text - why is it in the > diagram, given it has no role in the firmware update, other than as a common carrier of opaque bits? > > "The following assumptions are made to allow the firmware consumer to > verify the received firmware image and manifest before updating > software:" > The following three bullets aren't really assumptions. Perhaps > 'statements about the verification process' would be a better phrase. > Would another reference to suit-information-model here be useful, to > explain why the details are not given here? > > See item #4b) above about highlighting that confidentiality is > optional, not just 'deployment specific'. > > "There are different types of delivery modes, which are illustrated > based on examples below." > Shouldn't this sentence start section 5? (Also see my earlier point > about 'operating modes' / 'interaction modes' terminology). > > Fig 3 is inconsistent with Fig 1, in that it omits the firmware > consumer function. > > Fig 4 is inconsistent with Figs 1 & 3, in that there is also an arrow > from the status tracker to the author. What does this imply? > > "This architecture does not mandate a specific delivery mode but a > solution must support both types. > Whatever for? This requirement surely over-plays the IETF's hand, > which is not in a position to make such a demand? Is the intention > really that being agnostic to the delivery mode means every solution > must support all delivery modes? > > 6. Manifest > > Given each of the items in the second bullet list addresses one of the > questions in the first bullet list, it would be useful to tabulate > them side-by-side and to put them in a more meaningful order, e.g. in > the order they occur during firmware update. Also, the the first > question bullet (author > trust) is not specifically addressed in the second list - implied > within the last bullet, but not explicitly stated. > > 7.1 > s/Combined with the non-relocatable nature of the code/ > /Due to the non-relocatable nature of the code/ > > 7.3 > "This configuration has two or more CPUs in a single SoC that share > memory (flash and RAM). Generally, they will be a protection > mechanism to prevent one CPU from accessing the other’s memory." > I know what is intended, but it reads as if line 1 contradicts line 3. Perhaps: > "... > mechanism to prevent one CPU from unintentionally accessing memory currently > allocated to the other." > > 9. Example > > In at least one example figure, it would be useful to show the initial > pre-loading of keys, policy logic and trust anchor into the firmware > consumer / bootloader. > > s/starting with an author uploading the new firmware to firmware server/ > /starting with an author uploading the new firmware to the firmware > server/ > > "This setup does > not use a status tracker and the firmware consumer component is > therefore responsible for periodically checking whether a new > firmware image is available for download." > It needs to be much clearer that the status tracker has both a > monitoring function and an update triggering function. So, altho it is > essential in the server push model - to trigger updates, it's > monitoring function means it is not ruled out for the client pull model. > > Fig 5 & 6 are inconsistent, in that the former omits the IoT device > box around the Firmware consumer and bootloader. > > s/Figure 6 shows an example follow with the device using a status tracker./ > /Figure 6 shows an example with the device using a status tracker./ > > "For editorial reasons the author publishing the manifest at > the status tracker and the firmware image at the firmware server is > not shown." > How about: > "Depiction of the author publishing the manifest at > the status tracker and the firmware image at the firmware server would > be the same as in Figure 5. So for brevity they are not shown." > > 11. Security Considerations > > Between > "A report about this workshop can be found at [RFC8240]." > and > "A standardized firmware manifest format..." > there either needs to be some glue text to explain that the initial > manifest format was an output of the workshop (if it was), or a new > para if the second sentence really doesn't follow from the first. > > Note also that I suggest (item #1) that the motivating text about the > workshop should be moved to the introduction. I also say (in item 2c) > that the scoping bullets would be better at the end of the Intro too. > However, I can also see a case for them remaining under Security > Considerations; to admit that the document does not fully address all possible security concerns. > > Given this could leave nothing in the Security Considerations section, > it would be appropriate to merely point to all the sections of the > document that already cover security matters. > > == References == > [1] Byers, J.; Luby, M.; Mitzenmacher, M. & Rege, A. A Digital > Fountain Approach to Reliable Distribution of Bulk Data Proc. ACM > SIGCOMM'98, Computer Communication Review, 1998, 28 > > [2] Perrig, A.; Szewczyk, R.; Wen, V.; Culler, D. E. & Tygar, J. D. SPINS: > Security Protocols for Sensor Networks Proc. ACM International > Conference on Mobile Computing and Networks (Mobicom'01), 2001, > 189-199 > > [3] Briscoe (Ed.), B. & others Network Functions Virtualisation; > Security; Problem Statement ETSI NFV Industry Specification Group > (ISG), ETSI NFV Industry Specification Group (ISG), 2014 > > > > _______________________________________________ > Tsv-art mailing list > Tsv-art@xxxxxxxx > https://www.ietf.org/mailman/listinfo/tsv-art -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/ IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call