The bugs we most dread are situations that only happen rarely, and are only detected long after the damage has been done. Given the business we are in, we will face many of them. We apparently have such bugs open at this very moment. In most cases, the primary debugging tools one has are audit and diagnostic logs ... which WE do not have because they are too expensive (because they are synchronously written with C++ streams) to leave enabled all the time. I think it is a mistake to think of audit and diagnostic logs as a tool to be turned on when we have a problem to debug. There should be a basic level of logging that is always enabled (so we will have data after the first instance of the bug) ... which can be cranked up from verbose to bombastic when we find a problem that won't yield to more moderate interrogation: (a) after the problem happens is too late to start collecting data. (b) these logs are gold mines of information for a myriad of purposes we cannot yet even imagine. This can only be done if the logging mechanism is sufficiently inexpensive that we are not afraid to use it: low execution artifact from the logging operations reansonable memory costs for bufferring small enough on disk that we can keep them for months Not having such a mechanism is (if I correctly understand) already hurting us for internal debugging, and will quickly cripple us when we have customer (i.e. people who cannot diagnose problems for themselves) problems to debug. There are many tricks to make logging cheap, and the sizes acceptable. There are probably a dozen open-source implementations that already do what we need, and if they don't something basic can be built in a two-digit number of hours. The real cost is not in the mechanism but in adapting existing code to use it. This cost can be mitigated by making the changes opportunistically ... one component at a time, as dictated by need/fear. But we cannot make that change-over until we have a mechanism. Because the greatest cost is not the mechanism, but the change-over, we should give more than passing thought to what mechanism to choose ... so that the decision we make remains a good one for the next few years. This may be something that we need to do sooner, rather than later. regards, ---mark--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html