how to demonstrate cache-hit improvements (in sysfs callbacks) ?

Jim Cromie <jim.cromie@xxxxxxxxx> · Fri, 01 Sep 2006 15:19:03 -0600

hi folks,

over on lm-sensors, we have a discussion about the merits of
2D-combo-callbacks vs 1D-callbacks.  Id like advice on how to measure
the cache effects of several variations ..

background and terminology:

0-D callback:
a dedicated callback thats used to handle a single instance of an 
attribute-file

1-D callback:
A routine that serves an 'array' of identical attributes, forex voltage 
inputs 0..10, temp 0..6, etc.
These use struct sensor_attributes, which add an index onto struct 
device_attributes.
The callback 'backcasts' from latter to former, then grabs the index, 
which it uses
to fetch the desired data.
The object-code savings can be significant, esp when used to clean out
functions which are macro-repeated (with ##offsets) for attr arrays of 
10 or more items.

static ssize_t show_fan_min(struct device *dev, struct device_attribute 
*devattr, char *buf)
{
   struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
   int index = attr->index;

      struct pc87360_data *data = pc87360_update_device(dev);
      return sprintf(buf, "%u\n", FAN_FROM_REG(data->fan_min[index],
                     FAN_DIV_FROM_REG(data->fan_status[index])));
}

static struct sensor_device_attribute fan_min[] = {
      SENSOR_ATTR(fan1_min, S_IWUSR | S_IRUGO, show_fan_min, 
set_fan_min, 0),
      SENSOR_ATTR(fan2_min, S_IWUSR | S_IRUGO, show_fan_min, 
set_fan_min, 1),
      SENSOR_ATTR(fan3_min, S_IWUSR | S_IRUGO, show_fan_min, 
set_fan_min, 2),
};

2-D callbacks:
this takes an extra step, and handles multiple attribute access in one 
combo-callback.

static ssize_t show_fan(struct device *dev, struct device_attribute 
*devattr, char *buf)
{
      struct sensor_device_attribute_2 *attr = 
to_sensor_dev_attr_2(devattr);
      int idx = attr->index;
      int func = attr->nr;
      struct pc87360_data *data = pc87360_update_device(dev);
      unsigned res = -1;

      switch(func) {
      case FN_FAN_INPUT:
              res = FAN_FROM_REG(data->fan[idx],
                                 FAN_DIV_FROM_REG(data->fan_status[idx]));
              break;
      case FN_FAN_MIN:
              res = FAN_FROM_REG(data->fan_min[idx],
                                 FAN_DIV_FROM_REG(data->fan_status[idx]));
              break;
      case FN_FAN_STATUS:
              res = FAN_STATUS_FROM_REG(data->fan_status[idx]);
              break;
      case FN_FAN_DIV:
              res = FAN_DIV_FROM_REG(data->fan_div[idx]);
              break;
      default:
              printk(KERN_ERR "unknown attr fetch\n");
      }
      return sprintf(buf, "%u\n", res);
}

Ive applied this 2nd technique to 1 driver, and see the expected 
code-shrink:

hwmon-stuff]$ size pre19-*/drivers/hwmon/pc87360.o
  text    data     bss     dec     hex filename
 12372    3016      29   15417    3c39 pre19-0/drivers/hwmon/pc87360.o
 10960    3432      29   14421    3855 pre19-1/drivers/hwmon/pc87360.o

[jimc@harpo hwmon-stuff]$ nm -S pre19-0/drivers/hwmon/pc87360.o |grep 
show_in
00000781 00000023 t show_in_alarms
0000062b 00000048 t show_in_input
000006bb 00000048 t show_in_max
00000673 00000048 t show_in_min
00000703 0000002c t show_in_status
[jimc@harpo hwmon-stuff]$ nm -S pre19-1/drivers/hwmon/pc87360.o |grep 
show_vin
000009e4 000000ad t show_vin

Thats all well and good, but the cache-effect has been questioned:

For example I wonder how the code above interacts with the
CPU cache, compared to 1-level-indexed callbacks, in the typical
"sensors" scenario. I don't really have the time to investigate this,
unfortunately. Switch/case is usually not recommended in performance
terms, even though I'd expect gcc to optimize it relatively nicely if
the "func" values are chosen wisely.

My initial thought is that, as 1st rule of thumb, smaller code means 
better cache-hits,
esp instruction cache, which is I think the only concern being raised - 
data cache effects
are far more global / action-at-a-distance, and hence unpredictable 
(correct?)

2nd, the standard usage pattern here is `sensors`, which appears (from 
output at least)
to read all attributes of each sensor in turn.  FWIW, given the naming 
convention used
for hwmon driver attributes, it describes the accesses done by
   cat /sys/devices/platform/i2c-9191/9191-6620/*

But, rather than arguing from what I 'know', Id like to learn some stuff 
I dont know.
SOOOO

how would I go about measuring this ?
It seems that this has generic value, and Im happy to put it up on 'our' 
;-) wiki

tia
jimc

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/