For the most part I second jhutz's opinion on this draft. Additional comments inline...
Jeffrey Hutzelman wrote:
My first overall thought is "why??" Not everything needs to be wrapped in XML, and in this case, it appears that there are few real benefits and a number of significant drawbacks.
It is difficult to tell from the document whether the authors actually intend this format as a substitute for programming-data formats like S-records, or as a format for transferring data dumps over the Internet. It has a number of drawbacks which would seem to make it unsuitable for the former case, and doesn't seem to offer much over a raw hex dump for the latter. It would be helpful if the authors could clarify the intended application for this format.
It appears to me that this format is primarily useful as a storage format. The authors state that the format is not secure enough by itself to ensure integrity when transferring between systems. In a transport format there should be some means of ensuring that all blocks have been received in the correct order and that no additional blocks
were added. This description does not attempt to provide those features
therefore I believe it is meant for storage. This should be stated
clearly.
Even for a storage format it would be nice for the authors to specify a means for describing block order as well as a means for providing a checksum over the entire dump.
I would also advise the authors and other interested parties to examine draft-housley-cms-fw-wrap-09.txt, on which an IETF Last Call recently concluded. It describes a method for securely transporting firmware images over the internet and directly to hardware devices. While it is too complex to be suitable for direct programming of low-level devices, it is quite appropriate for delivery as far as the workstation, device programmer, or bootloader. [Note that I have nothing to do with that document, other than having recently reviewed it]
In any case, I see a number of problems, some of which are significant;
- This specification repeatedly uses the word "byte" to refer to an octet. Further, it prohibits representation of data with word sizes which are not multiples of 8 bits, claiming that such things are not used in "practical present-day applications". While byte sizes other than 8 bits and word sizes which are not multiples of 8 bits have become extremely uncommon in general-purpose computing devices, they are still used in more special-purpose devices, and many of the low-level devices which are within the stated scope of this document are programmed with data which uses "odd" word sizes.
The document provides a definition for "byte" and then defines "octet" as a "byte" but doesn't use it after that. I would replace all references to "byte" with "octet" and get rid of "byte" entirely.
The restriction to "word" sizes which are multiples of "octets" seems a bit odd to me as well especially given the restriction on the size of a block being specified in "bits".
- The introduction indicates an intent to provide an alternative to formats used for "hexadecimal data" and particularly device programming data, the de facto standard "S-record" format is mentioned by name. However, it fails to capture a fundamental property of such formats, which is that they are generally simple enough to send to a device or programmer without further parsing. The authors admit that an XML parser is "not easily deployed in hardware devices", but suggest that instead a workstation should be used to convert data from the specified format into one the device can actually handle.
If this is the expected use case, then I fail to see the advantage over simply transporting the data over established file-transfer protocols (FTP, HTTP) in a format which can be directly understood by the device. Many devices can be programmed by sending the distributed image over an RS-232 connection with no preprocessing; requiring a translation step severely reduces the set of devices that can be used for this purpose. For example, it makes it unlikely that I would be able to walk around my machine room with a PDA, upgrading firmware in network devices or RAID controllers.
One example that I came up with would be a driver update package in which the dump contained different versions of the same drivers for
different platforms. Perhaps one for a 32-bit version of the OS and
another for the 64-bit version. Only one of which would actually be
used. In such an example, the dump is simply a storage medium and
the processing application would be selectively extracting from it independent data streams for eventual delivery to the device.
Unfortunately, I feel like I am searching for a target application which should be spelled out in the document.
- This specification REQUIREs the use of SHA-1, providing no means to upgrade to an alternate hash in the future. This lack of algorithm agility is not very forward-looking.
The checksum is currently embedded in the header for each block. The
problem I have with this is that it restricts the size of the block to
be something storable in memory and even assumes that the entire data block must be available prior to the generation of the header. It is
very likely that the source of the data being stored by be coming from
a stream source and there may not be enough memory to store it all before writing to the dump media.
The checksum should be stored as a tag inside the block and the tag should contain an attribute specifying which algorithm was used.
A similar checksum tag should be available to validate the entire dump.
- In section 4.1, you say "if the value is untrue...". I suspect you mean something like "if the value does not match...". Further, rather than leaving the behaviour in the case of an incorrect length up to the implementation, it should be RECOMMENDED (RFC2119) that implementations reject such files.
- In section 4.2, you require the start_address attribute to be provided, even though it may not be meaningful in all cases. This attribute should be OPTIONAL.
I can see this format being used to store crash data from an application for later debugging. In this case there may be blocks which contain stack information or register contents which are not memory addressable.
- I don't believe 64 bits are required to represent word size. In fact, I question whether it is necessary for this format to represent word size at all.
I believe that word size may make sense for some types of blocks which would be stored in the dump file but it should not be REQUIRED. I believe the most general applications would only be interested in octet streams.
- The number of blocks is OPTIONAL, but the block length is REQUIRED. Further, there is a per-block checksum but no overall checksum. These properties would seem to suggest that the intent is to allow stream-encoding by encoding an arbitrary number of relatively small blocks. This is fine, but lacking both a block count and an overall checksum, there is no way to tell whether the entire dump was transferred correctly. I would suggest adding an overall-checksum element, to be encoded after the last block (_not_ as an attribute).
If one purpose is to allow encoding an arbitrary number of small blocks, there should be some indication of whether order is important, whether blocks can be dropped, etc.
- Why is the number of _bits_ in a block limited to 2^64-1? This limitation seems unnecessary, given that everything else is done in terms of octets.
Why bits if the word size is restricted to octets? Why not just specify the number of words since words are already required?
- The requirement that words inside a dump be represented in network order is silly. The contents of a dump are by their nature specific to a particular device, and should be in whatever format is most appropriate for that device. Again, I question whether this format should have any notion of "words" at all.
As one of the comments in the ID Tracker stated, the byte order representation for each block should be determined by the application.
Each block should have an attribute specifying the byte order used.
My biggest concern is that this format is not general enough. I fear that because the uses the authors were considering are not spelled out that there are underlying assumptions embedded in the document which will hamper its usefulness.
Jeffrey Altman Secure Endpoints Inc.
begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:Secure Endpoints Inc. adr:;;255 W 94TH ST PHB;NEW YORK;NY;10025;United States email;internet:jaltman@xxxxxxxxxxxxxxxxxxxx title:President tel;work:+1 212 769-9018 x-mozilla-html:TRUE url:http://www.secure-endpoints.com version:2.1 end:vcard
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf