Genart last call review of draft-ietf-ccamp-alarm-module-07

Dan Romascanu via Datatracker <noreply@xxxxxxxx> · Mon, 18 Mar 2019 13:27:26 -0700

Reviewer: Dan Romascanu
Review result: Almost Ready

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-ccamp-alarm-module-07
Reviewer: Dan Romascanu
Review Date: 2019-03-18
IETF LC End Date: 2019-03-20
IESG Telechat date: Not scheduled for a telechat

Summary:

The document defines a YANG module for Alarm management. While it's clear and
comprehensive, it ignores previous work done in the IETF and does not provide a
good motivation about the reasons of ignoring it. I would suggest to revisit
that part and provide a clear motivation for the need of this new design.

Major issues:

1. The definition of Alarm is key for the whole model. It reads like this:

> Alarm (the general concept): An alarm signifies an undesirable state in a
resource that requires corrective action.

However, RFC 3877 already defined a number of concepts including:

>  Error
      A deviation of a system from normal operation.

   Fault
      Lasting error or warning condition.

   ....

   Alarm
      Persistent indication of a fault.

I believe that there is a need to show why the model defined by RFC 3877 needs
to be changed, and why the difference that RFC 3877 was making between a Fault
and an Alarm is no longer needed. Also, RFC 3877 defined in Section 3 a
Framework and an Architecture that was consistent with X.733. This document has
no such section, and while acknowledging the need for a mapping to X.733 it
states as a goal:

> Mapping to X.733, which is a requirement for some alarm systems. Still, keep
some of the X.733 concepts out of the core model in order to make the model
small and easy to understand

More details about what is left out and why these are not needed would help.

Minor issues:

1. Section 2 makes a statement that includes

> ... While IETF has not really addressed alarm management

This is is actually not accurate. RFC 3877 addressed Alarm Management. Maybe
there is a need to revise that approach, but this should be done explicitly,
not by stating that it did not exist.

2. Section 3.5:

> Closing an alarm implies that the operator considers the corrective action
performed.

Is this always true? The undesirable state may have been cancelled by some
other event than corrective action, for example the resource is no longer used,
or the time elapsed mat have made the undesirable state irrelevant.

3. In section 3.5.1:

> Alarms are not cleared by operators, only the underlying instrumentation can
clear an alarm.  Operators can close alarms.

So, the document makes a distinction between clearing an alarm and closing an
alarm. It may be good to define two two concepts to make the distinction clear.

4. Appendix F.1:

> The alarm MIB is state oriented rather than notification oriented, an alarm
is a "lasting condition", not a discrete notification reporting about a
condition state change.

I am not sure that I understand this comment. Alarm states are defined also in
this document, and Alarms as defined here are also different than ' a discrete
notification reporting about a condition state change'. So, what does this
comment really try to say?

Nits/editorial comments: