On Tue, 2012-10-23 at 18:30 +0100, Pawel Moll wrote: > > === Option 1: Trace event === > > This seems to be the "cheapest" option. Simply defining a trace event > that can be generated by a hwmon (or any other) driver makes the > interesting data immediately available to any ftrace/perf user. Of > course it doesn't really help with the cpufreq case, but seems to be > a good place to start with. > > The question is how to define it... I've came up with two prototypes: > > = Generic hwmon trace event = > > This one allows any driver to generate a trace event whenever any > "hwmon attribute" (measured value) gets updated. The rate at which the > updates happen can be controlled by already existing "update_interval" > attribute. > > 8<------------------------------------------- > TRACE_EVENT(hwmon_attr_update, > TP_PROTO(struct device *dev, struct attribute *attr, long long input), > TP_ARGS(dev, attr, input), > > TP_STRUCT__entry( > __string( dev, dev_name(dev)) > __string( attr, attr->name) > __field( long long, input) > ), > > TP_fast_assign( > __assign_str(dev, dev_name(dev)); > __assign_str(attr, attr->name); > __entry->input = input; > ), > > TP_printk("%s %s %lld", __get_str(dev), __get_str(attr), __entry->input) > ); > 8<------------------------------------------- > > It generates such ftrace message: > > <...>212.673126: hwmon_attr_update: hwmon4 temp1_input 34361 > > One issue with this is that some external knowledge is required to > relate a number to a processor core. Or maybe it's not an issue at all > because it should be left for the user(space)? If the external knowledge can be characterized in a userspace tool with the given data here, I see no issues with this. > > = CPU power/energy/temperature trace event = > > This one is designed to emphasize the relation between the measured > value (whether it is energy, temperature or any other physical > phenomena, really) and CPUs, so it is quite specific (too specific?) > > 8<------------------------------------------- > TRACE_EVENT(cpus_environment, > TP_PROTO(const struct cpumask *cpus, long long value, char unit), > TP_ARGS(cpus, value, unit), > > TP_STRUCT__entry( > __array( unsigned char, cpus, sizeof(struct cpumask)) > __field( long long, value) > __field( char, unit) > ), > > TP_fast_assign( > memcpy(__entry->cpus, cpus, sizeof(struct cpumask)); Copying the entire cpumask seems like overkill. Especially when you have 4096 CPU machines. > __entry->value = value; > __entry->unit = unit; > ), > > TP_printk("cpus %s %lld[%c]", > __print_cpumask((struct cpumask *)__entry->cpus), > __entry->value, __entry->unit) > ); > 8<------------------------------------------- > > And the equivalent ftrace message is: > > <...>127.063107: cpus_environment: cpus 0,1,2,3 34361[C] > > It's a cpumask, not just single cpu id, because the sensor may measure > the value per set of CPUs, eg. a temperature of the whole silicon die > (so all the cores) or an energy consumed by a subset of cores (this > is my particular use case - two meters monitor a cluster of two > processors and a cluster of three processors, all working as a SMP > system). > > Of course the cpus __array could be actually a special __cpumask field > type (I've just hacked the __print_cpumask so far). And I've just > realised that the unit field should actually be a string to allow unit > prefixes to be specified (the above should obviously be "34361[mC]" > not "[C]"). Also - excuse the "cpus_environment" name - this was the > best I was able to come up with at the time and I'm eager to accept > any alternative suggestions :-) Perhaps making a field that can be a subset of cpus may be better. That way we don't waste the ring buffer with lots of zeros. I'm guessing that it will only be a group of cpus, and not a scattered list? Of course, I've seen boxes where the cpu numbers went from core to core. That is, cpu 0 was on core 1, cpu 1 was on core 2, and then it would repeat. cpu 8 was on core 1, cpu 9 was on core 2, etc. But still, this could be compressed somehow. I'll let others comment on the rest. -- Steve _______________________________________________ lm-sensors mailing list lm-sensors@xxxxxxxxxxxxxx http://lists.lm-sensors.org/mailman/listinfo/lm-sensors