Hello All. I'm just a Linux Russian GNU/Linux user and I would like to propose one simple idea. I'll try to be as brief as possible. A few months ago a proposal of new binary log system coupled with systemd has been made by Lennart Poettering. It rose numerous emotional discussions in Russian Linux community, a lot of users have seen severe unix-way violation there and thus wouldn't like to adopt it. The new log system addresses the issues outlined by Lennart in his intro on https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs At first I wanted to make a comparison against the proposed approach, but now I don't want this message to be that long. However, one can state that all the goals, mentioned there, can be achieved within my method combined with some external tools if necessary. I'd like to propose possibly less destructive but yet efficient solution for this problem: machine-readable text logs. The most famous such format is, definitely, JSON. I will use it to illustrate my ideas, but it's absolutely not the best one for this purpose. Long story short, here is possible single log entry for apache: { "date": "2011-11-23 23:25:36.0545 +0400", "pid": 2104, "name":"apache2", "severity": 1, "sha1": 2a5162d0e83756bd559e13d13a5a7651fbe0d068, ... "msg": { "ip": "127.0.0.1", "req": "GET", "file": "index.html", "http_version": 1.0, "result": 200 } } Note that this is complete JSON object consisting of two lines: first is syslog header, second is application-defined message enclosed in "msg" object. This allows to use traditional tools like grep for log processing - not much changes here. And you can still read log with a naked eye. What makes difference is that we've created some structured objects within simple text file and can now create automated tools to analyze log of any program. Something like this: jparser "SELECT date, msg.file FROM apache.log WHERE msg.code=200 AND date >2011.11.22" So we can make queries directly to log files with a tool similar to XPath for JSON (JSONPath). But probably a better solution would be to import log into a NoSQL database and use all its enormous power and performance for further log processing. Note that this requires almost zero effort since all such tools already exist. Advantages over plain-text log files: - Easy and unambigous parsing - Traditional processing can be used more effectively combined with some preprocessing - Strict and formal structure of most important fields - Unified format fot logs, config files, utilities output etc. - Queries can be made directly to log file - Applications can easily add their own fields with mere printf() without breaking any processing tools - Log can be directly imported into powerfull NoSQL database - Binary blobs can be unobtrusively added as bas64-encoded fields - Multiple logs can be processed at once with something similar to SQL JOIN operator - Index can be easily built separately for fast searching Advantages over proposed binary log files: - Backward compatibility (mostly) with existing tools - Much simpler to emit - More reliable and can be fixed with just VIM - Existing syslogs doesn't even need to be heavily modified - Direct import into existing fast and powerfull NoSQL databases (almost no conversion required) - Format is easiliy extendable withou breaking any compatibility - Applications can make logs without any external libraries with mere printf() - More advanced goals like clustering still can be achieved with external methods without bloating the core subsystem But we can extend this even further and use such formats not only for log files: since it can describe arbitary data structures, it can be used for configuration files and in commands output. I was once looking through the code of linux utils, which take their data from procfs and the implementation made me cry: each utility has to write its own buggy and redundant parser for arbitary file format. While if we had unified text format for data representation, the parsing would be something like int free_mem = jparser_get_num ("/proc/meminfo", "FreeMem"); int free_pages = jparser_get_num ("/proc/vmstat", "nr_free_pages" ); and so on. The same applies to grep'ing numerous ulitities (like ifconfig and others) output. Current scripts are barely readable, insecure and highly fragile - simple addition of a new parameter or indentation change will break them. While we could just something write ifconfig --json | jparser "interfaces[*].inet" to get addresses of all interfaces regardless of formatting, indentation and presence of other params (note that I'm using different syntax to show different possibilities). I understand that some work has to be done anyway. For example, the format definitely needs to be extended with at least variables for config files, which will make the parser more complicated. I think that this is all possible to resolve, though. And I would like to emphasize once again that I don't think that JSON is the best choice here. So, what do you think? Are there any chances that something like this will be introduced in unix configs, utilities and log files? And if not - why? PS Another maybe more readable lisp-style format alternative: (((:date 2011-11-23 23:25:36.0545 +0400) :pid 2104 :name apache2 ...) (:msg :ip 127.0.0.1 :req GET :file index.html ...)) -- Alexander Sauta -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel