Re: Best practice for data encoding?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A little post script to this discussion: I wrote a few small test programs in C to evaluate the performance of reading integers from a text file using <stdio.h> versus doing the same with direct read() s from a binary file. The difference is between two and three orders of magnitude. See http://ablog.apress.com/?p=1146

Iljitsch,

in your original question to the list, you didn't quite make clear that your question was with respect to BGP-style transfer of large- scale routing information.

Right now, you seem to focus on decoding performance. How much of the CPU time spent for BGP is decoding? Does the CPU time spent for the entirety of BGP even matter*? If yes, can a good data structure/encoding help with the *overall* problem?

The results from your test programs are not at all surprising.
Of course, a hand-coded loop where all data already is in the right form (data type, byte order, number of bits), no decisions need to be made, and you even know the number of data items beforehand, is going to be faster than calling the generic, pretty much neglected, parameterized, tired library routine fscanf that doesn't get much use outside textbooks. (The "read" anomaly is caused by read(2) being an expensive system call; all other cases use a form of buffering to reduce the number of system calls.) What this example shows nicely is that performance issues are non- trivial, and, yes, you do want to run measurements, but at the system level and not at the level of "test cases" that have little or no relationship to the performance of the real system.

If you really care about the performance of text-based protocols, you cannot ignore modern tools like Ragel. If, having used them, you still manage to find the text processing overhead in your profiling data, I'd like to hear from you.

Still, for BGP, a binary protocol encoding may be a better fit because routing tables are so much about bits and prefixes and other numeric information already designed to be used in binary protocol encodings. Also, it may be easier to reduce both data rate and processing by exploiting more of the structure of the BGP routing information. (I.e., to make it redundantly clear, I would probably choose binary here, but not for the reasons given in your blog post.)

Gruesse, Carsten

*) Yes, that's a trick question to elicit responses :-)


_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]