Re: SIP CLF Format

"Dale Worley" <dworley@xxxxxxxxxx> · Wed, 04 Feb 2009 16:25:16 -0500

On Mon, 2009-02-02 at 16:39 -0600, Vijay K. Gurbani wrote: 
> > - What exactly are the events that are logged?  E.g., section 4.1.2 
> > refers to a "method event", which is a phrase I've never heard 
> > before. Does it refer to SIP requests?
> 
> Ah, yes; another good catch.  Method event refers to SIP requests.
> I will make this more clear in a subsequent revision.

It would help if you replace "method" with "request" throughout, since
"method" seems to be normally used to refer to only the method specified
in a request, not the request as a whole.

> > In which case, does it mean the *reception* of a SIP request or the
> > *sending* of a SIP request? Or is the intention to log the
> > request/response pair as one line (as would make sense in an HTTP
> > server, but less sense in a SIP proxy that implements forking)?
> 
> For SIP proxies you cannot log the request/response pair in one
> line. That is where S4.2 comes in. The directives in S4.2
> allow the representation of a [request] arriving at a proxy as 
> well as departing from the proxy (or B2BUA).

That makes sense, but I don't see a statement up front saying "one CLF
line is generated upon the receipt of each SIP message (request or
response) and one CLF is generated upon the transmission of each SIP
message".  One has to deduce from various examples that these are the
events that generate CLF lines.

I am assuming here that all provisional responses (received and sent)
are to be logged.

But perhaps I am running up against a philosophical problem -- Is the
intention to log SIP *messages*, or to log *transactions*?  In the
former case, it's unambiguous what events generate log lines and what
the lines mean.  But in the latter case, much more care needs to be
taken to specify what is logged and when, as there isn't a one-to-one
correspondence between SIP events (transmission and receipt of messages)
and transactions.

I suspect the intention is that for UAS transactions, you want CLF to
log transactions (and thus always fold the transmitted response into the
received request), but that for UAC transactions, and the server/client
transactions of proxies/B2BUAs, you want to log each sent and received
message separately.

The idea of using one line to give both request and response data is
natural for an HTTP server, where both happen at nearly the same moment.
But in SIP, in at least half its actions, even a UA will be a client,
where the request and response may be separated by a substantial time.

> > - In regard to the remotehost (%h) field, does "upstream" mean (as it
> >  usually does) "the source if the request that initiated this 
> > processing"?  
> 
> Yes.
> 
> > In which case, there seems to be no place to identify 
> > the host/address to which an outgoing request is sent, or the 
> > host/address from which an incoming response is received (since that 
> > is a "downstream" host).
> 
> That is captured in S4.2.  The entity to which a request is
> sent downstream is identified by the R-URI (see top of page
> 13.)  I agree this is a bit hard to follow until you sit down
> and work it out, but that is the nature of the complexity.
> Any suggestions to make it less complex would be greatly
> welcome.

The entity to which a request is sent downstream is not reliably
identified by the R-URI, due to all the rules in RFC 3263.  (Consider an
element sending a request sequentially to several different resolutions
of the R-URI in attempt to find a server that is functioning.)  Given
our experience with sipX, I would strongly recommend that the triple
"transport/host IP address/port" be logged for all sent messages.
Otherwise CLF will be useless for diagnosing routing actions at the RFC
3263 level.

> > - Some care needs to be taken to describe how CANCEL and ACK are 
> > logged. (E.g., ACK for success and ACK for failure *might* be logged 
> > differently -- what is intended?)  This need is discussed in section 
> > 1, but I don't see clear prescriptions of what is to be done.
> 
> There is an example in S4.1.4; CANCEL is discussed in detail
> there.  Regarding ACK for 2xx and ACK for non-2xx, I believe
> that they ought to be logged similarly but the interpretation of
> them is different (i.e., the automata reading the CLF file will
> trigger different states based on whether or not the ACK is
> for a 2xx or non-2xx.)

I think you've got the right idea, but I would much prefer that clear
specifications be laid out in addition to the examples.

> > - Are the "..." around %c really necessary?  Or is that a legacy of 
> > the Apache format?  (And if so, does the Apache format allow %c to 
> > contain spaces?  How does that affect the identification of fields?)
> 
> Apache, of course, does not have a "%c"; it does, however, have
> a "%r", which represents the request line as it comes in from
> a browser.  The Apache CLF encloses this in a pair of braces
> since the "%r" represents three different tokens in Apache:
> the method (GET), the resource (/index.html) and the protocol
> version ("HTTP/1.0").
> 
> "%c" for SIP CLF was put in braces just to include any LWS (and
> I will probably have to look at this in more detail as the work
> moved ahead.)

(I assume you mean "quotes" there.)

Though it doesn't seem that %c can contain whitespace, since it is a
comma-separated list of URIs, or else is "-".  

I also notice that despite that %c is always enclosed in quotes, if
there are not contacts, %c is represented by a single hyphen, which
would appear as quote-hyphen-quote in a CLF line.  It would be more
uniform to turn "%c" to "", and the fields could still be parsed.  Even
better, eliminate the quotes entirely, and use hyphen if there are no
contacts.

But it doesn't look like %c is the contact URIs, but rather the contact
name-addrs, based on the second example in section 4.1.4.  In that case,
there is a lot more trouble, because a name-addr can contain a quote.

Consider the particularly ugly example:

Contact: <sip:123@xxxxxxxxxxx>;param=" value1 value2 "

Embedded into the second example of section 4.1.4:

1230756560 192.168.1.2 alice REGISTER sip:example.com sip:alice@xxxxxxxxxxx;tag=iu8u76 sip:alice@xxxxxxxxxxx;tag=yh78 8719u@xxxxxxxxxxx 200 "<sip:123@xxxxxxxxxxx>;param=" value1 value2 "" serverid -

(I hope that comes through OK.)  A parser that looks for the end of the
quote-%c-quote field by looking for quote-space will stop after
";param=", and will take "value1" as the server transaction id and
"value2" as the client transaction id, whereas the real server
transaction id is "serverid".

> > - What are the extension mechanisms (other than additional fields at 
> > the ends of the two line formats)?  Is there any way to identify 
> > which software version generated a particular log file, so that the 
> > reader can unambiguously determine how to interpret the extension 
> > fields?
> 
> Excellent question.  There isn't a versioning system for the
> SIP CLF itself; however, the extension mechanism is rather
> simple: you can include any other field you want besides the
> fields that constitute the CLF.  We do not restrict this,
> subject to the discussion in S5 (Security Considerations.)
> Of course, what we want to standardize are the fields we
> have listed and in the order that we have listed them in.
> These constitute the canonical definition of a "SIP CLF".
> Implementations are free to add other fields as long as they
> provide/maintain updated SIP CLF parsers.

But there is no way for a CLF parser that supports extensions to
determine which extensions are present, other than heuristically.
Maybe that this is particularly important.  Perhaps we could require
that extensions start with a field formatted "token[" and end with a
field "]", so a parser could identify the start, end, and type of each
extension without specific knowledge of all extensions?

> > - Why is the to-URI (%t) present in the response format?  It's 
> > redundant with the same information in the corresponding request 
> > format.  And if we want such redundant information, why not provide 
> > %f as well?
> 
> There is a discussion of this in S4.1.3.  Should we embellish
> the discussion in there some more?

Given that %x and %y are present in response lines, and those identify
the other requests/responses in the transaction, and indirectly the
dialog, why is the to-tag (in %t) needed?

Looking at %x and %y, the server and client transaction identifiers, I
see six cases:

Line for message                %x                      %y

Incoming requests (server)      server trans id         -
or                              server trans id         FORK/-

Outgoing responses (server)     server trans id         -
or                              server trans id         FORK/-

Outgoing requests (client)      server trans id         CLIENT/client trans id

Incoming responses (client)     server trans id         CLIENT/client trans id

Based on this, it seems like the "FORK/" and "CLIENT/" are redundant,
and %y could be defined as "'-' for server transactions and the
transaction identifier for client transactions".

Also, the definitions given in 4.1.1 aren't quite correct.  E.g., one
example shows a REGISTER server transaction with no server transaction
id given (because the transmitted response is folded into the same CLF
line that logs the request), but the description of %x doesn't say that
it can be omitted in that circumstance.

Dale

_______________________________________________
Sipping mailing list  https://www.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP
Use sip-implementors@xxxxxxxxxxxxxxx for questions on current sip
Use sip@xxxxxxxx for new developments of core SIP