I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. Document: draft-ietf-netmod-rfc6020bis-12 Reviewer: Dale R. Worley Review Date: 2016-05-12 IETF LC End Date: 2016-05-12 IESG Telechat date: 2016-05-19 Summary: This draft is basically ready for publication, but has nits that should be fixed before publication. I have a large number of editorial comments. A few of these comments address technical uncertainties, but I strongly suspect that the technical issues have long since been fixed, as this is a revision of Yang 1, and the only needed work is clarifying the text. - Abstract This Abstract does not list what the document does. E.g., define Yang 1.1. Also might help to mention upward-incompatibility. Basically, go through section 1 to see what should be in Abstract. Perhaps: YANG is a data modeling language originally designed to model configuration and state data manipulated by the Network Configuration Protocol (NETCONF), NETCONF remote procedure calls, and NETCONF notifications. This document describes the syntax and semantics of version 1.1 of the YANG language. YANG version 1.1 is a maintenance release of the YANG language, addressing ambiguities and defects in the original specification. There are a small number of upward-incompatibilities from Yang 1. This document also describes how a data model defined in a YANG module is encoded in the Extensible Markup Language (XML), and how NETCONF operations are used to manipulate the data. - section 1.1 The incompatibilities should be marked whether the old, incompatible usage always causes an error "at compile time", "at run time", or changed behavior. A number of later items seem to be upward-incompatibilities but not marked as such: o Made the "yang-version" statement mandatory. o Made noncharacters illegal in the built-in type "string". - section 3 o identifier: Used to identify different kinds of YANG items by name. This item, unlike the others, does not start with a noun phrase. Perhaps, "A string used to ..."? o leaf: A data node that exists in at most one instance in the data tree. A leaf has a value but no child nodes. I'm not sure that this is correct; a leaf schema node inside a list schema node can have many instances in a data tree. The real criterion is that it has a value but no child nodes. o leaf-list: Like the leaf node but defines a set of uniquely identifiable nodes rather than a single node. Each node has a value but no child nodes. Better to say a "sequence of nodes". Without the concept that the nodes are a sequence (and can be identified by location within the sequence), there's no assurance that the nodes are uniquely identifiable (since they can have duplicate values in state data). o mandatory node: A mandatory node is one of: ... * A container node without a "presence" statement, which has at least one mandatory node as a child. Perhaps replace the comma with "and"? It would be useful to insert a note somewhere that all data is either "configuration" or "state" data. It's hard to learn that now, because the terms "configuration data" and "state data" are defined only by reference to RFC 6241. There are general issues with the terms "namespace" and "prefix". "namespace" is used to identify both the namespaces of identifiers in Yang source (6.2.1) as well as XML namespaces in the XML encodings of data trees. It would make things clearer if all uses that refer to XML namespaces used the form "XML namespace". "prefix" is used to name both identifier prefixes in Yang source code and XMLNS prefixes. Worse, the prefix statement is used to declare one string, which is then both the prefix used within the Yang source code and also the preferred (but not mandatory!) XMLNS prefix used to refer to the associated XML namespace. It would help if "XMLNS prefix" was used whenever talking specifically about prefixes used in XML. See also comments on section 7.1.4. - section 3.1 There is an overall question regarding whitespace in XML in "non-significant" places. Is it allowed in the XML representation of data trees? And if so, precisely what whitespace is allowed? Also, to what degree is the whitespace shown in examples what is allowed on the wire, and to what degree is it there just to make the example easier to read? (It's possible that XML has implemented some sort of global solution for the issue of non-significant whitespace, but back when I was using it regularly, there was none.) - section 4.1 YANG structures data models into modules and submodules. A module can import data from other external modules, and include data from submodules. "data" has other meanings. Perhaps change to "elements of models" or "definitions"? The instantiation of these groupings can refine or augment the nodes, allowing it to tailor the nodes to its particular needs. "instantiate" is used with two meanings, the more common one being the relationship between nodes and data tree elements, and the insertion of a grouping into a node tree. E.g., see the definitions of "data tree" and "uses" in section 3. It would make life easier if there was a different word for the second meaning -- perhaps "use"? In any case, this particular sentence has no context to disambiguate the two meanings. The conversion from YANG to YIN is lossless, so content in YIN can be round-tripped back into YANG. This is almost certainly not true. But there is no loss of Yang semantics, or something like that, and that could be stated. YANG strikes a balance between high-level data modeling and low-level bits-on-the-wire encoding. The reader of a YANG module can see the high-level view of the data model while understanding how the data will be encoded on-the-wire. Is that true in cases where the NETCONF XML wire encoding is not used? Or is that encoding mandatory in some sense, even if the environment is not NETCONF? If Yang is intended to describe data which will *always* be expressed in XML in a certain way, that fact should be introduced earlier in the document. - section 4.2.1 Section should start with a definition of "module" and what modules are used for. The current text is a list of details about "module" without telling what the *point* of a module is. A module may be divided into submodules, based on the needs of the module owner. There is a global problem that the reader needs to know the relationships between "module" and "submodule", but those are not stated explicitly anywhere. (This is probably a candidate for inclusion in section 3.) AFAICT: A module - is defined by a module statement in its own file - defines a group of data trees, etc. that can be used by NETCONF - "includes" the submodules that "belong to" it A submodule - is defined by a submodule statement in its own file - defines things that can be used in the module statement - "belongs to" a single module, which is specified in the "belongs-to" substatement of the submodule statement - does NOT have submodules of its own "module owner" is not defined in section 3. And is it the right term; is there an ownership relationship, or does this just mean "whoever wrote the module"? Or is there an understood social context in which a module will be written that needs to be paid attention to in this document? (E.g., regarding allocation of XML namespaces.) There is also the risk of misunderstanding between "own" and "belong to" -- a submodule "belongs to" its module but the module doesn't "own" the submodule. Also, the module isn't "divided" into submodules, since there is always a module statement. Perhaps a better version of this paragraph is: A module may have portions of its definition separated into submodules, based on the needs of the module writer. This separation is not visible outside the text of the module and submodules; neither importation into other modules nor the data encodings are affected. -- The "import" statement allows a module or submodule to reference material defined in other modules. There is no definition of "material". Clearly, it means "things that can be referenced", but this points out that there is no term for "things that can be referenced", but it would be useful to have one. Then one could state, e.g., that a module can reference ???s defined in its submodules, or it can reference ???s defined in modules that it imports. The "include" statement is used by a module to incorporate the contents of its submodules into the module. In a sense, "include" isn't really like "import", because it's not optional; it's the mandatory reverse-link of the "belongs-to" statement in the submodule. (The module and all submodules automatically have access to definitions in all submodules due to their relationship.) And that has been the cause of a lot of my confusion on this topic; "include" and "import" are discussed together in many places, making it unclear that "a module includes a submodule" is really a *static* relationship, not a statement that the programmer could insert or not according to need. A clearer statement would be: The "include" statement is used in a module to identify each submodule that belongs to it. The module's definitions have access to the definitions in the submodule. It might be useful to define "include" and "belong to" in section 3. - section 4.2.2.1 leaf host-name { type string; description "Hostname for this system"; } It might be useful here to provide a forward reference to 6.3 as a reference for the syntax of statements. If you don't already know the syntax, it can be hard to figure out that "host-name" is the node's identifier. Or perhaps the forward reference could be put at the end of 4.2.2. - section 4.2.2.3 A container may contain any number of child nodes of any type (leafs, lists, containers, leaf- lists, actions, and notifications). If the plural of "leaf" is "leafs" (rather than the standard English "leaves"), that should be noted in the entry in section 3. - section 4.2.2.4 A list defines a sequence of list entries. Each entry is like a structure or a record instance, and is uniquely identified by the values of its key leafs. A list can define multiple key leafs and may contain any number of child nodes of any type (including leafs, lists, containers etc.). Each entry is like a *container*, which was just defined above, which might be a more intuitive way to describe it. Perhaps "... is uniquely identified by the values of one or more specified leaf children, the key leafs." XML Encoding Example: <user> <name>glocks</name> <full-name>Goldie Locks</full-name> <class>intruder</class> </user> <user> <name>snowey</name> <full-name>Snow White</full-name> <class>free-loader</class> </user> <user> <name>rzell</name> <full-name>Rapun Zell</full-name> <class>tower</class> </user> I note that this example, unlike the previous ones, is not a single XML element; there is no XML element that designates the list as a whole. That seems to be a general pattern in the XML encodings for Yang, that groups of elements can be defined in Yang, but the resulting set of elements is not wrapped in a single element in the XML. That happens with lists (where each list element is an element but the list as a whole is not), leaf-list (where there is no element surrounding the repeated leaf element), groupings, and choices with their child cases. Perhaps this is a general pattern in XML used for network management, but it's not quite what one would expect if one thinks of data structures in a programming language. Perhaps it would be helpful to point out this pattern somewhere that global XML issues are described (e.g., whitespace issues). - section 4.2.3 YANG can model state data, as well as configuration data, based on the "config" statement. When a node is tagged with "config false", its subhierarchy is flagged as state data. In NETCONF, state data is reported using the <get> operation, not the <get-config> operation. Parent containers, lists, and key leafs are reported also, giving the context for the state data. Does this mean that <get> returns the state data and <get-config> returns the configuration data? Or does <get> return both the state and configuration data? Perhaps this is not really a question for this document, but if the behavior of <get> and <get-config> is mentioned, they should be described thoroughly. - section 4.2.4 YANG has a set of built-in types, similar to those of many programming languages, but with some differences due to special requirements from the management domain. "from the management domain" is awkward, and "domain" has no previous definition; better "requirements of network management". | enumeration | Enumerated strings | Perhaps "one of an enumerated set of strings". | string | Human-readable string | What does "human-readable" mean? Can passwords be stored in such leafs? Does this really mean "a string of Unicode characters"? Perhaps "character string" would be better. - section 4.2.7 YANG allows the data model to segregate incompatible nodes into distinct choices using the "choice" and "case" statements. The "choice" statement contains a set of "case" statements that define sets of schema nodes that cannot appear together. Each "case" may contain multiple nodes, but each node may appear in only one "case" under a "choice". When a node from one case is created in the data tree, all nodes from all other cases are implicitly deleted. The server handles the enforcement of the constraint, preventing incompatibilities from existing in the configuration. The choice and case nodes appear only in the schema tree but not in the data tree. The additional levels of hierarchy are not needed beyond the conceptual schema. This description reads very oddly. AFAICT, "choice" simply defines a union type, where each alternative is a container (usually called a structure or record). Or rather, it's conceptually a container, but in an XML instance of data, there is no start-end tags for the structure as a whole (paralleling the lack of start-end tags for lists as a whole). Once you state that, it's clear why there are "sets of schema nodes that cannot appear together". - section 4.2.8 It would help to give a general discussion of augmenting. If one module augments another, is the augmented data definition part of the data structures defined by augmenting module or the augmented one? If I've got it right, this module: module /system/login/user { namespace "urn:user"; prefix "user"; list user { key "name"; leaf name { type string; } leaf full-name { type string; } leaf class { type string; } } } gives this as the XML encoding of a typical data tree: <user xmlns="urn:user"> <name>alicew</name> <full-name>Alice N. Wonderland</full-name> <class>drop-out</class> <other:uid>1024</other:uid> </user> whereas the *existence* (in some sense) of this module: module /system/login/user-augmenter { namespace "urn:user-extension"; prefix "ext"; augment /system/login/user { when "class != 'wheel'"; leaf uid { type uint16 { range "1000 .. 30000"; } } } } changes the encoding of the data tree to: <user xmlns="urn:user" xmlns:ext="urn:user-extension"> <name>alicew</name> <full-name>Alice N. Wonderland</full-name> <class>drop-out</class> <ext:uid>1024</ext:uid> </user> What is it that triggers this action-at-a-distance? If a module augments another module, the XML encoding of the data will reflect the prefix of the augmenting module. For example, if the above augmentation were in a module with prefix "other", the XML would look like: This is hard to interpret. Better would be: If a module augments another module, the XML elements that are added to the encoding are in the namespace of the augmenting module (which should preferentially be associated with the prefix of the augmenting module). The restriction is on the XML namespace of the new element, not its XMLNS prefix. - section 5.1 The module is the base unit of definition in YANG. A module defines a single data model. A module can define a complete, cohesive model, or augment an existing data model with additional nodes. What is the semantic content of "complete, cohesive"? It seems like "defines a data model" is shorter and equally accurate. If I understand Yang correctly, every module defines a data model (which may have no leafs!), but some also augment other data models. If that's so, the use of "or" is misleading, and it would be more accurate to say: The module is the base unit of definition in YANG. A module defines a single data model. A module can also augment an existing data model with additional nodes. Then again, perhaps a module that defines no data nodes cannot be referenced as a data model. If so, that should be stated. -- Developers of YANG modules and submodules are RECOMMENDED to choose names ... "RECOMMENDED" is an adjective which qualifies the choice, not an adverb that qualifies the act of choosing. I think you want "... SHOULD choose ...". See RFC 2119. A module uses the "include" statement to include all its submodules, and the "import" statement to reference external modules. Similarly, a submodule uses the "import" statement to reference other modules. This tends to confuse the nature of "include" and "import". I would prefer: A module has an "include" statement for each of its submodules. A module, or submodule belonging to that module, can reference definitions in the module and all submodules included by the module. A module or submodule uses the "import" statement to reference external modules. Statements in the module or submodule can reference definitions in the external module using the prefix specified in the "import" statement. -- For backward compatibility with YANG version 1, a submodule is ... It's not clear why backward compatibility is needed here, since yang-version marks each module explicitly with the language version that is used. References to definitions in the local module MAY use the prefix notation. I would add "... using the prefix defined for the module." - The syntax of colon Note there are four uses of ":" in the text: section 5.1 When a definition in an external module is referenced, a locally defined prefix MUST be used, followed by ":", and then the external identifier. section 6.1.2 A keyword is either one of the YANG keywords defined in this document, or a prefix identifier, followed by ":", followed by a language extension keyword. section 7.1.4 When a reference to an identifier from the imported module is used, the prefix string for the imported module is used in combination with a colon (":") and the identifier, e.g., "if:ifIndex". section 7.19 The statement's name is created by combining the prefix of the module in which the extension was defined, a colon (":"), and the extension's keyword, with no interleaving whitespace. The BNF shows that in all cases, no whitespace is allowed around the ":". I think that consistent wording should be used in all four locations to make it clear that whitespace/comments/newlines are not allowed. Also "combined with" is unnecessarily inexact, "concatenated with" or "followed by" is much better. - section 5.1.1 The beginning of this section should be a discussion of the nature of revisions abstractly. AFAICT, an (abstract) module has one or more revisions, each of which is identified by a distinct revision date. Each of these is described by one module statement, each of which is in a separate file. It seems to be preferred (but seemingly not required) that a revision of a module will have revision statements that specify the revision dates of itself and all older revisions in reverse chronological order -- the actual specification of the revision date of a module seems to be done externally to the module statement. In regard to submodules, it seems that a revision of the module specifies the revisions of all of its submodules, but a revision of a submodule only specifies the module's identifier, not its revision. When a module is written, it can import the current revisions of other modules, based on what is available at the time. "current" is a dangerous word to use, because it means the narrative present moment. You want to say: When a module is written, it can import the revisions of other modules that are current when the module is written. -- When the author of the module is prepared to move to the most recently published revision of an imported module, the module is republished with an updated s/the module is republished/the author's module is republished/ "import" statement. By republishing with the new revision, the authors explicitly indicate their acceptance of any changes in the s/their acceptance of any changes in/the use of the newer revision of/ imported module. Why is "acceptance" being used here? It has political connotations. It seems like "use of" carries less baggage. For submodules, the issue is related but simpler. A module or submodule that includes submodules needs to specify the revision of the included submodules. If a submodule changes, any module or submodule that includes it needs to be updated. The second sentence is unclear. I think you mean "In order for a module or submodule to use definitions in a new revision of a submodule, the module must be updated to include the new revision of the referenced submodule." If a module is not imported with a specific revision, it is undefined which exact revision is used. Delete "exact" here; it isn't needed and is too informal. Perhaps replace with "specific". (There are other instances of "exact" with the same issue.) - section 5.1.2 YANG allows modeling of data in multiple hierarchies, where data may have more than one top-level node. Models that have multiple top- level nodes are sometimes convenient, and are supported by YANG. Any single module seems to define only one data structure, the one whose elements are the items listed in the module, and Yang doesn't seem to provide for any sort of "model" definition other than a module. Or are the nodes at the top level within the module all usable as independent data structures? I.e., given a module statement, what is the set of allowed root XML elements for data trees in its XML namespace? - section 5.1.2.1 For example: module example-config { yang-version 1.1; namespace "urn:example:config"; prefix "co"; container system { ... } container routing { ... } } could be encoded in NETCONF as: s/For example/For example, an instance of/ NETCONF is capable of carrying any XML content as the payload in the <config> and <data> elements. The top-level nodes of YANG modules are encoded as child elements, in any order, within these elements. This isn't very clear. AFAICT from the example: NETCONF <config> and <data> elements each contain sequences of instances of top-level nodes of the appropriate YANG module. Given that this example is about a particular module, it would help if the module namespace and prefix was used correctly in the XML example. - section 5.2 YANG modules and submodules are typically stored in files, one module or submodule per file. This might be clearer as "... one module or submodule statement per file." The name of the file SHOULD be of the form: module-or-submodule-name ['@' revision-date] ( '.yang' / '.yin' ) Does the extension ('.yang' or '.yin') correspond to whether the contents of the file are in Yang or Yin syntax? - section 5.3 All YANG definitions are specified within a module that is bound to a particular XML namespace [XML-NAMES], which is a globally unique URI [RFC3986]. This doesn't clearly specify the range of "particular" and "unique". A better phrasing is All YANG definitions are specified within a module. Each module is bound to a distinct XML namespace [XML-NAMES], which is a globally unique URI [RFC3986]. -- XML namespaces for private modules are assigned by the organization owning the module without a central registry. Namespace URIs MUST be chosen so they cannot collide with standard or other enterprise namespaces, for example by using the enterprise or organization name in the namespace. I don't see the use of "without a central registry"; haven't you already said the module is is private? (Or is "without a central registry" a term of art?) Also, the second sentence is redundant, as you've previously said that the URI is globally unique, which implies that whoever assigned the URI has some mechanism for assuring its uniqueness. - section 5.4 There is no worry over conflicts if both modules define the type, since there is no ambiguity. I think you mean "There is no ambiguity if both modules define types with the same name.". - section 5.5 This section talks a great deal about why scoping is done, but isn't specifically clear what the rules are. (Traditionally, the rule is stated as either what range of source text a particular definition is visible in, or given an identifier use, how the matching definition is found.) Scoped definitions MUST NOT shadow definitions at a higher scope. A type or grouping cannot be defined if a higher level in the schema hierarchy has a definition with a matching identifier. I think the second sentence is confusing, because (to me) the term "schema hierarchy" implies the data tree structure, whereas the scoping rule seems to be intended to be entirely lexical. In either case, this needs to be clarified. - section 5.6 Conformance is a measure of how accurately a server follows the model. There is no model already under discussion, so you can't use "the model". Perhaps Conformance to a model is a measure of how accurately a server follows a model. - section 5.6.5 I suspect that if a server implements module A, and A imports B, the server is not required to implement B. (Even if A references groupings in B.) The answer to that should probably be stated explicitly. If a server lists a module C in the "/modules-state/module" list from "ietf-yang-library", and there are other modules Ms listed that import C without specifying the revision date of module C, the server MUST use the definitions from the most recent revision of C listed for modules Ms. I think the end of that would be clearer as ... the server MUST implement Ms by importing the most recent revision of C listed in the "/modules-state/module" list. module b { yang-version 1.1; namespace "urn:example:b"; prefix "b"; revision 2015-04-04; revision 2015-01-01; typedef myenum { type enumeration { enum zero; // added in 2015-01-01 enum one; // added in 2015-04-04 } } container x { // added in 2015-01-01 container y; // added in 2015-04-04 } } Is this the correct way to specify multiple revisions of this module? Or is this an informal notation for the two module definitions: module b { yang-version 1.1; namespace "urn:example:b"; prefix "b"; revision 2015-04-04; revision 2015-01-01; typedef myenum { type enumeration { enum zero; // added in 2015-01-01 enum one; // added in 2015-04-04 } } container x { // added in 2015-01-01 container y; // added in 2015-04-04 } } module b { yang-version 1.1; namespace "urn:example:b"; prefix "b"; revision 2015-01-01; typedef myenum { type enumeration { enum zero; // added in 2015-01-01 } } container x { // added in 2015-01-01 } } If it is the latter, it would help the new reader to explain the convention (as no example of a multi-revision module is given in the document). - section 6 YANG modules use the UTF-8 [RFC3629] character encoding. Better to say that YANG modules use the Unicode character set and are stored in files using the UTF-8 character encoding. Verify that form-feed (0x0C) is not allowed. This might be a good place to talk about line-breaks in Yang source files. It looks from section 14 that line-breaks MUST be either CRLF or LF. It is worth noting (either here or in 6.1.3) how line-breaks inside quoted strings are transcribed into the string's value. As now written, it seems that the line-break is transcribed identically to how it is represented in the source. That means (1) If the source is recoded with the other type of line break, the semantics of the Yang code change; and (2) if the source uses line-breaks of one type (CRLF or LF), only that type can be directly transcribed into string values. (But regardless of the source line-breaks, an LF can be transcribed into a double-quoted string with "\n". But a CRLF cannot be transcribed into a double-quoted string with escape sequences. Was that intended, or was "\r" intended to be legal?) - section 6.1 This section details the rules for recognizing tokens from an input stream. Generally, language definitions intersperse the narrative text with the relevant grammar definitions. Yang's statement grammar is simple enough that one doesn't need to see the context-free part of the grammar to understand the narrative for statements. But when reading about tokenization, not having the grammar presented at the same time is quite a burden. I'd recommend duplicating the relevant productions from section 14 into the subsections of section 6. There is some sort of exposition problem. The result of "tokenization" is that the sequence of characters of the source is converted into a sequence of "tokens". Then some subset of the tokens is discarded as being non-significant (e.g., whitespace and comments), and the remainder is parsed with a context-free grammar. Here I can't figure out what the set of tokens is. Looking at the grammar in section 14, it seems to be a context-free grammar on characters. But that implies that there is no separate tokenization phase. An example that shows the problems: mod:ext Is this one token, which is also an extension keyword, or is it a sequence of three tokens? - section 6.1.1 A block comment is enclosed within "/*" and "*/". Should say "starts with "/*" and ends with the nearest following "*/"". - section 6.1.2 A token in YANG is either a keyword, a string, a semicolon (";"), or braces ("{" or "}"). What is the intention of this sentence? One interpretation is that it defines "token" as the union of several other classes. But that has the limitation that those classes are not well-defined in this section. E.g., what is the syntax of an "unquoted string"? The other interpretation is that the Yang file is already separated into "tokens", and that the set of tokens is further subdivided, with all tokens that are not keywords, quoted strings, semicolons, or braces being unquoted strings, whose value is the sequence of characters within the token. But the separation into tokens is not described. "string" is particularly confusing. In most languages, "string" values are always quoted in source code. But it seems that in Yang, any "token" that isn't a keyword or one of ;, {, }, is automatically an "unquoted string". If that's so, it needs to be stated explicitly, since it's unusual. - section 6.1.3 If a string contains any space, tab, or newline characters, a single or double quote character, a semicolon (";"), braces ("{" or "}"), or comment sequences ("//", "/*", or "*/"), then it MUST be enclosed within double or single quotes. Is a string containing "*/" required to be quoted? I ask because the character sequence "*/" is not ambiguous if it is not preceded by "/*" -- it must be an unquoted string. If a double-quoted string contains a line break followed by space or tab characters that are used to indent the text according to the layout in the YANG file, this leading whitespace is stripped from the string, up to and including the column of the double quote character, or to the first non-whitespace character, whichever occurs first. In this process, a tab character is treated as 8 space characters. This description isn't quite careful enough. Better: If a double-quoted string contains a line break followed by space or tab characters, an initial part of this whitespace is removed from the string. The amount removed is the longest prefix whose width is no larger than the width of the prefix of Yang source line containing the opening double quote character of the string to and including the opening double quote character. For this purpose, the width of a tab character is 8 and the width of any other character is 1. This does assume that all tabs are considered to have width 8, that is, tabs do not have the usual semantics of "advance to the next column that is divisible by 8". That will sometimes cause unexpected results, e.g., if some source lines start with SPC TAB. (Consider that whitespace before a line break is removed, which suggests the intention is that the value of the string should depend only on its visual appearance.) Also, we're using the convention that "whitespace" does NOT include CR or LF, which is not always how the term is used. Perhaps a definition of "whitespace" should be put in section 3. There is also the special case: SPC " LF TAB x " Is the initial TAB of the second line to be removed or not? There is no whitespace removal in the second line that will exactly reach the opening double quote. As I've written it, the TAB is not removed. Within a double-quoted string (enclosed within " "), a backslash character introduces a special character, which depends on the character that immediately follows the backslash: s/introduces a special character/introduces a representation of a special character/ Verify that CR cannot be specified by an escape. If a quoted string is followed by a plus character ("+"), followed by another quoted string, the two strings are concatenated into one string, allowing multiple concatenations to build one string. Are whitespace, line endings, or comments allowed between the quoted strings and the "+"? (This description is in the lexical section, so non-significant lexemes can't be assumed to have been removed.) (See also comments on section 14.) If a quoted string is followed by a plus character ("+"), followed by another quoted string, the two strings are concatenated into one string, allowing multiple concatenations to build one string. Whitespace trimming is done before substitution of backslash-escaped characters in double-quoted strings. Concatenation is performed as the last step. I think you want to reverse the phrase order in the second sentence: "In double-quoted strings, whitespace trimming is done before substitution of backslash-escaped characters." However, it's not clear that that matters; only a backslash followed by whitespace would be affected by whitespace trimming, and whitespace trimming would leave the backslash followed by CR or LF -- all of those cases are invalid. Also, if you keep that second sentence, it should be moved earlier -- the first and third sentences of the paragraph apply to all quoted strings, but the second sentence only applies to double-quoted strings, and should be in a paragraph that deals with only double-quoted strings. - section 6.1.3.1 The following examples show some illegal strings: '''' - a single-quoted string cannot contain single quotes """ - a double quote must be escaped in a double-quoted string These shouldn't be described as "illegal" strings; they aren't strings, truly, but the lexing process doesn't isolate them as tokens, after which they are decreed to be improper. The first example is two single-quoted strings (''), one after the other. The second is a double-quoted string (""), followed by the double-quote that starts another double-quoted string. Better to say "The following examples are not string tokens:". - section 6.2 Identifiers can be specified as quoted or unquoted strings. If there is no lexical difference between how identifiers may be represented and how strings may be represented, and what is an identifier is distinguished only by the syntactic context, it would be helpful if that was stated explicitly in section 6.1. (Though the characters allowed in identifiers are restricted.) - section 6.3.1 The processing of extensions depends on whether support for those extensions is claimed for a given YANG parser or the tool set in which it is embedded. An unsupported extension, appearing in a YANG module as an unknown-statement (see Section 14) MAY be ignored in its entirety. Any supported extension MUST be processed in accordance with the specification governing that extension. This allows a loophole, where an implementation can not claim support, but then not ignore the extension. I think you want to say "An unsupported extension ... MUST be ignored in its entirety." (See also 7.20.1, which implies that an unsupported *feature* must be ignored.) If you really mean "MAY", what are the limits regarding what the server can do? - section 6.5 In an absolute schema node identifier, the first identifier after the leading slash is any top-level schema node in the local module or in all imported modules. I think "in all imported modules" should be "in an imported module". (The prefix of the identifier will tell which module.) - section 7.1 The module name follows the rules for identifiers in Section 6.2. Why not say "The module name is an identifier."? Similarly for all other occurrences of "follows the rules for identifiers". (I don't think we hang any semantics on the term "identifier", only syntax.) - section 7.1.4 See discussion of "prefix" in comments for section 1.1. I think all that is needed is to update the second paragraph to: When used inside the "module" statement, the "prefix" statement defines the prefix suggested to be used when this module is imported. To improve readability of the NETCONF XML, a NETCONF client or server that generates XML or XPath that use prefixes SHOULD use the prefix defined by the module as the XMLNS prefix to associate with the module's namespace (which may be impossible if there is a conflict with another XMLNS prefix). The important thing is that the discussion of using the prefix value as an XMLNS prefix is split into a separate paragraph, so it's clear that it is a different subject. The prefix string MAY be used to refer to definitions contained in the module, e.g., "if:ifName". I think this should be qualified "MAY be used within the module to refer to the definitions contained"; the following paragraph suggests that a module that imports this module can specify a different prefix for its references to use, and this prefix will not work in that module. - section 7.1.5 Multiple "import" statements may be specified to import from different modules. Is this a requirement that two "import" statements in the same module MUST import different modules? If not, the sentence should be shorted to "Multiple "import" statements may be specified.", or omitted entirely. But if a module can't be imported twice, that should probably be made clearer Multiple "import" statements may be specified, but they must import from different modules. - section 7.1.6 When a module includes a submodule, it incorporates the contents of the submodule into the node hierarchy of the module. What does this mean? The "include" statement is a top-level statement within its module, so how is its contents part of the node hierarchy -- what node is the parent of its nodes? Perhaps the meaning is that top-level nodes of the submodule are implicitly top-level nodes of the module. - section 7.2 While the primary unit in YANG is a module, a YANG module can itself be constructed out of several submodules. This doesn't really capture the essence. What you want to say is While the primary unit in YANG is a module, a YANG module can include submodules, whose definitions are incorporated into it. - section 7.5.1 In the first style, the container has no meaning of its own, existing only to contain child nodes. I think it would be good to add here "In particular, the presence of the container element with no child elements is semantically equivalent to the absence of the container element." - section 7.5.4.3 must "ifType != 'ethernet' or " + "(ifType = 'ethernet' and ifMTU = 1500)" { ... } must "ifType != 'atm' or " + "(ifType = 'atm' and ifMTU <= 17966 and ifMTU >= 64)" { ... Are these a preferred style? I ask because it would be shorter to say must "ifType != 'ethernet' or ifMTU = 1500" { must "ifType != 'atm' or " + "(ifMTU <= 17966 and ifMTU >= 64)" { (Perhaps the ifMTU element is optional, but even then, "and" must short-circuit for the given expressions to always be valid, so it seems that "or" must short-circuit.) - section 7.5.7 I suspect that child elements can be omitted if they're not mandatory, but their order need not match the order in the schema. - section 7.6.1 Note that if the leaf or any of its ancestors has a "when" condition or "if-feature" expression that evaluates to "false", then the default value is not in use. ISTM that this is not a "Note that" but rather a precondition of the analysis of the preceding paragraphs, so it should be moved above "Otherwise, the usage of the default value...". (See also the similar condition in 7.7.2.) - section 7.6.4 The default value MUST NOT be marked with an "if-feature" statement. I think you want "The definition of the default value..." rather than "The default value...". But it's not clear what the whole set of situations is that should be forbidden. For instance, this should work: typedef xyz { type enumeration { enum blue { if-feature blue; } ... } } leaf color { if-feature blue; type xyz; default blue; } Whereas this won't: typedef xyz { type enumeration { enum blue { if-feature blue; } ... } } leaf color { // No if-feature here. type xyz; default blue; } Is this rule only meant to cover the situation where the leaf's type is an "in-line" enum and the particular enum value has an if-feature? - section 7.7.1 It might be worth mentioning that duplicated values are allowed and are significant, since that has changed since Yang 1. I.e., even in a system ordered list, the number of list elements with any particular value must be preserved. - section 7.7.2 It appears to be assumed that the server's behavior is the same if (1) a leaf-list node is specified with zero values, and (2) if the leaf-list node is absent and has no default values. Otherwise, if the leaf-list's type has a default value, and the leaf-list does not have a "min-elements" statement with a value greater than or equal to one, then the leaf- list's default value is the type's default value. In all other cases, the leaf-list does not have any default values. I think it would help to s/the type's default value/one instance of the type's default value/. Is the final condition correct? E.g., if the leaf-list's type has a default value, and it has a min-elements with value 2, then the leaf-list has no default values -- which is invalid. - section 7.7.7.1 The entries in the list are sorted according to an order determined by the system. The "description" string for the list may suggest an order to the server implementor. If not, an implementation is free to sort the entries in the most appropriate order. This is mealy-mouthed about what is required and what is recommended. Better would be to omit "If not, an implementation is free to sort the entries in the most appropriate order." - section 7.7.7.2 The entries in the list are sorted according to an order defined by the user. This order is controlled by using special XML attributes in the <edit-config> request. Actually, the entries aren't "sorted", as there is no requirement that any particular set of values always has a particular order; the user can insert 1, 2, 3 at one time, and 3, 2, 1 at another. The correct meaning is splicing these two sentences: The user orders entries in the list by using special XML attributes in the <edit-config> request. - section 7.7.8 The XML elements representing leaf-list entries MAY be interleaved with other sibling elements, ... I think it would be more accurate to say "interleaved with elements for siblings of the leaf-list node". (See also section 7.8.5.) - section 7.8 A list entry is uniquely identified by the values of the list's keys, if defined. What does "if defined" apply to? Is the point that if keys are not defined, then list entries have no such uniqueness requirement? If the latter, then the text "Each entry ... is uniquely identified by the values of its key leafs." in section 4.2.2.4 needs to be changed. - section 7.8.2 The "key" statement, which MUST be present if the list represents configuration, and MAY be present otherwise, takes as an argument a string that specifies a space-separated list of leaf identifiers of this list. It would help to change to "list of one or more leaf identifiers" - section 7.8.3 The "unique" constraint specifies that the combined values of all the leaf instances specified in the argument string, including leafs with default values, MUST be unique within all list entry instances in which all referenced leafs exist. Should that sentence end "... within all list entry instances in which all referenced leafs exist or have default values"? - section 7.9 The "choice" statement defines a set of alternatives, only one of which may exist at any one time. Better, "only one of which may be present in any one data tree". A choice node does not exist in the data tree. This is a good phrase and should be applied to list nodes (in that there is no node for the list, only for each list element). A choice consists of a number of branches, defined with the "case" substatement. Better, "each defined with a "case" substatement". As a shorthand, the "case" statement can be omitted if the branch contains a single "anydata", "anyxml", "choice", "container", "leaf", "list", or "leaf-list" statement. In this case, the case node still exists in the schema tree, and its identifier is the same as the identifier in the branch statement. There is no "branch" statement... How about: As a shorthand, the "case" statement can be replaced by just the statement for the child node if the branch contains a single "anydata", "anyxml", "choice", "container", "leaf", "list", or "leaf-list" statement. In this case, the case node still exists in the schema tree, and its identifier is the same as the identifier of the child node. - section 7.9.3 The argument is the identifier of the "case" statement. Better, "The argument is the identifier of the default "case" statement." - section 7.9.5. XML Encoding Rules The choice and case nodes are not visible in XML. Use this sort of statement for lists and leaf-lists, which have no visible XML for the lists as a whole. - section 7.10 The "anydata" statement is used to represent an unknown set of nodes that can be modelled with YANG, except anyxml, ... What is the practical consequence of this restriction? It's not at all clear to me which chunks of XML could be modelled with Yang and which could not. - section 7.12 A grouping is like a "structure" or a "record" in conventional programming languages. Add, "... though no grouping node exists in the XML." - section 7.15 The "action" statement is used to define an operation connected to a specific container or list data node. Might it be better to say "list element data node"? What isn't being said explicitly is that an action that appears in a list statement is an action on a single element of the list, rather than an action on the list as a whole. Similarly for notifications, section 7.16. - section 7.15.2 The last container or list contains an XML element that ... Would "innermost" be better than "last"? - section 7.17 The "augment" statement allows a module or submodule to add to >a< schema tree defined in an external module, or >in< the current module and its submodules, and to add to the nodes from a grouping in a "uses" statement. I think adding "in" where indicated makes the sentence clearer. I've changed a "the" is changed to an "a" where indicated. The question is whether the module defines a schema tree or a schema forest, a set of trees. That depends on whether each top-level node can be used independently as the root of a data tree or whether they have to be used together. See comments on section 5.1.2. - section 7.18 It would help to insert a discussion of the global purpose and use of identities. In particular, what things can refer to them? It looks like only identityref leafs can do so, but I could easily be wrong. Also, more discussion in the example (7.18.3) would be helpful. The "identity" statement is used to define a new globally unique, abstract, and untyped identity. Its only purpose is to denote its name, semantics, and existence. The two uses of "its" in the second sentence are ambiguous. I think you mean "The statement's only purpose is to denote the identity's name, semantics, and existence." Though I'm not sure that an identity has any semantics beyond its existence and base identities. - section 7.19 The statement's argument is an identifier that is the new keyword for the extension and must be followed by a block of substatements that holds detailed extension information. s/The statement's/The "extension" statement's/ -- this talks about the "extension" statement that defines the extension statement [sic!]. - section 7.20.2.1 In this example, the container "target" is implemented if any of the features "outbound-tls" or "outbound-ssh" are supported by the server. "either" is better than "any" if there are only two choices. - section 7.21.4 The "reference" statement takes as an argument a string ... Perhaps s/a string/a human-readable string/. - section 7.21.5 Note that if a data definition has both an "if-feature" and a "when", then the "if-feature" is tested first. If the XPath expression references any node that also has associated "when" statements, these "when" expressions MUST be evaluated first. There MUST NOT be any circular dependencies in these "when" expressions. I think this could be better phrased: If the XPath expression references any node that also has associated "when" statements, then the "when" expressions of the referenced nodes MUST be evaluated first. There MUST NOT be any circular dependencies among "when" expressions. - section 9.3.4 Can one apply a fraction-digits restriction to a derived type that already has a fraction-digits restriction? ISTM that this is sensible if the fraction-digits numbers of the two restrictions are the same. Then again, that would be pointless, and we might want to forbid fraction-digits re-restrictions. The fraction-digits statement doesn't seem to have any of the substatements (e.g., description, error-app-tag) that other restrictions (e.g., length, pattern, range) do. (The grammar in section 14. agrees.) Is this OK? It seems like a value could fail to conform to fraction-digits. - section 9.6.3 An enumeration can be restricted with the "enum" (Section 9.6.4) statement. This should be fleshed out: "... can be restricted with one or more "enum" (Section 9.6.4) statements, which enumerate a subset of the values of the base type." - section 9.7.2 The lexical representation of the bits type is a space-separated list of the individual bit values that are set. I think that should be "... list of the names of the bits that are set." I tend to think of a "bit value" as 0 or 1. - section 9.8.3 The canonical form of a binary value follows the rules in [RFC4648]. Best specify "the rules of 'Base 64 Encoding' in [RFC4648]." - section 9.9 The leafref type is used to declare a constraint on the value space of a leaf, based on a reference to a set of leaf instances in the data tree. The "path" substatement (Section 9.9.2) selects a set of leaf instances, and the leafref value space is the set of values of these leaf instances. The first sentence isn't quite right. Perhaps: The value space of a leaf with leafref type is one of the values of a set of leaf instances in the data tree. The "path" substatement (Section 9.9.2) selects the set of leaf instances, and the leafref value space is the set of values of these leaf instances. - section 9.9.3 If "require-instance" is "true", it means that the instance being referred MUST exist for the data to be valid. This constraint is enforced according to the rules in Section 8. If "require-instance" is "false", it means that the instance being referred MAY exist in valid data. What does "the instance being referred" mean? Perhaps you meant "the instance being referred to"? Also, I think the second paragraph means that if require-instance is false, the referred-to instance need not exist. But it's not clear to me what that would mean -- if there are no instances referenced by the XPath expression, then the set of value values of the leafref would be empty and the leafref would necessarily be invalid. ... I think these paragraphs need to be expanded in some way. (require-instance is also used in instance-identifier, which may have subtly different semantics than when it is used in leafref.) - section 9.9.4. Lexical Representation A leafref value is lexically represented the same way as the leaf it references. Probably better as "... the same way as the leaf it references represents its value." - section 9.12 It is used to repeatedly specify each member type of the union. I think you want "It is used repeatedly to specify ..." -- "repeatedly" modifies "used", not "specify". In the XML encoding, a value representing a union data type is validated consecutively against each member type, in the order they are specified in the "type" statement, until a match is found. The type that matched will be the type of the value for the node that was validated. I think a distinction needs to be made here between generating XML and interpreting XML: When generating an XML encoding, a value is encoded according to the rules of the member type to which the value belongs. When interpreting an XML encoding, a value is validated consecutively against each member type, in the order they are specified in the "type" statement, until a match is found. The type that matched will be the type of the value for the node that was validated, and the encoding is interpreted according to the rules for that type. - section 10.3.1 If the first argument node is of type leafref, the function returns a node set that contains the nodes that the leafref refers to. I *think* that this means the set of nodes that the leafref's XPath selects, but a plausible alternative is the subset of those nodes which have the same value as the leafref does. This hinges on the meaning of "the leafref refers to a node", which isn't defined unambiguously in section 9.9. - section 11 o A "description" statement may be added or clarified without changing the semantics of the definition. Probably better to change "clarified" to "changed". (What is the specificational content of "clarified"?) o A "type" statement may be replaced with another "type" statement that does not change the syntax or semantics of the type. For example, an inline type definition may be replaced with a typedef, but an int8 type cannot be replaced by an int16, since the syntax would change. Is the last sentence correct? Any value that was previously valid would remain valid. This is similar to changing "range '-128..127'" to "range '-32768..32767'". - section 13 The translated module is called a YIN module. This section describes symmetric mapping rules between the two formats. I don't think "symmetric" is the right word; perhaps "bidirectional". - section 14 This grammar assumes that the scanner replaces YANG comments with a single space character. It would probably be better to explicitly include comments in the separator nonterminals: ;; 'comment' is specified in Section 6.1.1 sep = 1*(WSP / line-break / comment) ; unconditional separator optsep = *(WSP / line-break / comment) stmtsep = *(WSP / line-break / comment / unknown-statement) -- module-stmt = optsep module-keyword sep identifier-arg-str ... This has the consequence that 'module"name"{...' is invalid, because there is no separator after 'module', even though traditional tokenizers would handle it. Verify that this is intended. (This pattern is consistent for all statement types.) linkage-stmts = ;; these stmts can appear in any order *import-stmt *include-stmt This could easily be handled like body-stmts: linkage-stmts = *(import-stmt / include-stmt) -- if-feature-expr = "(" if-feature-expr ")" / if-feature-expr sep boolean-operator sep if-feature-expr / not-keyword sep if-feature-expr / identifier-ref-arg This should probably be marked ";; precedence rules specified in Section 7.20.2", as the grammar is ambiguous and does not specify the precedence, whereas typical programming language grammars encode the precedence rules. instance-identifier-specification = [require-instance-stmt] Why is require-instance-stmt in [...]? The only use of instance-identifier-specification is in type-body-stmts, and the only use of type-body-stmts is in type-stmt, where it is in [...]. Compare with numerical-restrictions: numerical-restrictions = range-stmt -- binary-specification = [length-stmt] Similarly. quoted-string = (DQUOTE string DQUOTE) / (SQUOTE string SQUOTE) string = < an unquoted string as returned by > < the scanner, that matches the rule > < yang-string > The handling of "string" is hard to follow and/or inexact. First, we need a term for the "value" of a string written in Yang source, so we can say prefix-arg-str = < a string whose ??? matches the rule > < prefix-arg > The text that is now used is not too difficult to understand, but it's inexact and makes it hard to write specifications about how the value is determined. (Also, there seem to be too many <...>, dividing the narrative text into two parts. The natural way to write it would be: prefix-arg-str = < a string whose ??? matches the rule prefix-arg > But it might not be worth the effort to change the use of <...> throughout section 14.) We need a production that tells exactly the syntax of "an unquoted string as return by the scanner". We also need productions for quoted strings. I can reconstruct these: quoted-string = (single-quoted-string / double-quoted-string) *(optsep "+" optsep (single-quoted-string / double-quoted-string)) single-quoted-string = SQUOTE *(yang-char excluding "'") SQUOTE (I don't know a good way to exclude one character from a character set in ABNF.) double-quoted-string = DQUOTE *dq-char DQUOTE dq-char = yang-char excluding BACKSLASH and DQUOTE / BACKSLASH ( "n" / "t" / DQUOTE / BACKSLASH ) Verify that "optsep" is what is allowed to appear around "+". Needs a comment pointing to Section 6.1.3 as telling how to interpret quoted-strings. [END]