> On Wed, 2023-03-15 at 07:36 +0100, Greg KH wrote: > > On Wed, Mar 15, 2023 at 06:08:19AM +0000, Seymour, Shane M wrote: > > > The following patch implements host state statistics via sysfs. The > > > intent is to allow user space to see the state changes and be able > > > to report when a host changes state. The files do not separate out > > > the time spent into each state but only into three: > > > > Why does userspace care about these things at all? > > This is the most important question: Why are times spent in various > states and transition counts important? Is this some kind of > predictive failure system, or is it simply logging? If it's logging, > wouldn't you get better information if we output state changes as they > occur then they'd appear as timestamped entries in the syslog from > which all these statistics could be deduced? Hi James, I had to write something to read the statistics to ensure that what was being provided was sane and usable. Currently the program does: 1) Logging of state changes (with a count and what the current state is). 2) Logging a percentage of time spent in recovery over the last interval (default 10 minutes) if that percentage is increasing. I do plan on implementing the following in the near future: 1) Keeping statistical information in memory (for at least): a) Hourly for the last 96 hours b) Daily for the last 90 days 2) Analysing that data hourly and daily to determine if there is a trend that is increasing or decreasing in terms of the count and the time spent (if any) in recovery. That is are things getting better, worse, or staying the same. My end goal is to provide at least some warning that there may be a storage issue and if it appears to be getting worse. I do want the user space program to be something more than just something that logs messages about state changes. In regard to your idea about outputting state changes it's interesting but I can see several drawbacks. The first is if you use syslog you don't really have any idea where the messages will end up. Different distros have different destinations (e.g. messages vs syslog vs systemd journal) and you can configure the syslog daemon so that the messages always end up on a different system. There will be issues handling those files as well. You need to cope with log file rotation, how many copies of old messages/syslog files are kept when rotated, if they are compressed or not (and reading them when they are), are any missing, how far to go back if there are a lot of old messages/syslog files. I think you would need to look at them all to determine what files were relevant and needed to be processed. Having said that none of those issues are insurmountable but it makes it hard to do the analysis I want to implement on the data. The variability of the quantity of available data (how many messages/syslog files you have) over a period of time provides challenges. > > > What tool needs them and what can userspace do with the > > information? > > > > [...] > > > A (GPLv2) program called hostmond will be released in a few months > > > that will monitor these interfaces and report (local host only via > > > syslog(3C)) when hosts change state. > > > > We kind of need to see this before the kernel changes can be accepted > > for obvious reasons, what is preventing that from happening now? > > I don't think that's a requirement. The whole point of sysfs is it's > user readable, so we don't need a tool to make use of its entries. On > the other hand if this tool can help elucidate the use case for these > statistics, then publishing it now would be useful to help everyone > else understand why this is useful. The main use of the existing code would be making it easier to work out how to read the statistics from the sysfs files at the moment. If the feedback is wait until I've fully implemented the user space program with the analysis component and made it available I'm more than happy to do that. Thanks Shane > > James