In the spirit of "send text," I've put together a straw-man proposal for
an easy-to-generate and fast-to-process extensible format for saving SIP
log messages:
http://www.ietf.org/internet-drafts/draft-roach-sipping-clf-syntax-00.txt
As an example of the processing that can be performed on this format:
consider that I have a large file (on the order of 1 GB of data), with
1,232,896 records in it (to choose a nice, round number). I'd like to
extract all the information about messages with a particular "From" value.
With a text-based format, I'll be reading and parsing 1,262,485,504
bytes (every byte in the file) in order to find delimiters.
With the format proposed in this document, I can open the file and then
do the following about 1,232,896 times:
- Read 4 bytes (total record length)
- Fseek 32 bytes to reach the "To Value" pointer and length
- Read 4 bytes
- Fseek according to those 4 bytes to the literal value of the to
header field
- Read the to header field (let's imagine it's 20 bytes)
- Fseek to the next record (according to the total record length)
In total, I'm reading 28 bytes per record 1,232,896 times, for a grand
total of 34,521,088 bytes -- or about 2.7% as much data as I do with a
text file.
When you're dealing with terabytes of log data, this can make the
difference between taking one minute to sift data and taking 37 minutes
to do the same operation. And, of course, it has the advantage that you
can add more (tagged) data to each record without causing any additional
processing load.
/a
_______________________________________________
Sipping mailing list https://www.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP
Use sip-implementors@xxxxxxxxxxxxxxx for questions on current sip
Use sip@xxxxxxxx for new developments of core SIP