On Fri, Jun 12, 2009 at 04:40:12PM -0400, Alan McKay wrote: > Any pointers for good reading material here? Other tips? The manuals and/or source code for your software? Stories, case studies, and reports from others in similar situations who have gone through problems? Monitoring's job is to avert crises by letting you know things are going south before they die completely. So you probably want to figure out ways in which your setup is most likely to die, and make sure the critical points in that equation are well-monitored, and you understand the monitoring. Provided you stick with it long enough, you'll inevitably encounter a breakdown of some kind or other, which will help you refine your idea of which points are critical. Apart from that, I find it's helpful to read about statistics and formal testing, so you have some idea how confident you can be that the monitors are accurate, that your decisions are justified, etc. But that's not everyone's cup of tea... - Josh / eggyknap
Attachment:
signature.asc
Description: Digital signature