This patch is based on Andy Lutomirski's iMC SMBus driver patch-set https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope this rewrite will: Overview Modern Intel memory controllers host an SMBus controller and connection to DIMMs and their thermal sensors. The memory controller firmware has three modes of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal Throttling (OLTT) and none. - CLTT: The memory controller firmware is periodically accessing the DIMM temperature sensor over the SMBus. - OLTT: The memory controller firmware is not accessing the DIMM temperature sensor over the SMBus but approximates/guesses its temperature. Depending on the temperature, the memory controller firmware may throttle the memory bandwidth and alike. Only one mode of operation can be used at a time. Intel recommends CLTT. This is also the default on our BIOS. Original Driver and its Rewrite The original driver i2c-imc.c was an iMC SMBus controller that provided access to the DIMM thermal sensors. A second driver dimm-bus.c, also part of Andy's patch-set, instantiated the thermal sensors. The original driver was written for the memory controller found in Sandy Bridge CPUs. Either the Sandy Bridge documentation is incomplete or the functionality is limited. It was not possible to use this driver while the memory controller was in CLTT mode as the driver and firmware were both accessing the memory controller without arbitration. We ran this driver on our Broadwell CPU and the driver's internal consistency check failed every 30 min or so. We rewrote this driver to support Broadwell's memory controller 8086.6fa8. Over time, support for more memory controllers should be added. Our documentation (Intel Xeon Processor D-1500 Product Family External Design Specification (EDS), Volume Two: Core and Uncore Registers Volume 2 of 5 Rev. 2.3) hints how to make OS drivers and firmware co-exist in CLTT mode. In short: - don't (necessarily) disable CLTT mode, but set tsod_polling_interval to 0 - wait 10 ms to drain a potential in-flight firmware CLTT transaction - OS has now exclusive access to the smb bus - set tsod_polling_interval to the previous value Our patch provides proper arbitration between OS and firmware on Broadwell. The original patch-set also provided an additional driver, dimm-bus.c, to instantiate the temperature sensors. It had some draw-backs: - the probe function i2c_scan_dimm_bus() blindly enumerates potential DIMM sensor i2c addresses causing the SBE bit to be set 6 times on our system. That is dangerous (see comment in i2c-imc.c: if (stat & SMBSTAT_SBE)). The i2c addresses of the actual temperature sensors are known to the memory controller (when in CLTT mode) and don't need to be blindly enumerated. - the probe function i2c_scan_dimm_bus() instantiates blindly 10 temperature sensors, although our system had only 2 DIMMs (with 1 temperature sensor each). The remaining 8 temperature sensors returned 0. - as already pointed out, the instantiations happen in a further driver dimm-bus.c. The iMC SMBus driver i2c-imc.c is calling dimm-bus.c to do its job. That does not feel right. I don't know how to do it better and even move for now the instantiations into the iMC SMBus driver itself (imc_instantiate_sensors(()). Please advice here. The mapping of dimm to i2c adapter and addresses is confusing at best. From the smb_stat_0 and from Andy's dimm-bus.c driver, I gain the impression the mapping may be channel 00 slot 00 i2c-1 0x18 (if there is a dimm) channel 00 slot 01 i2c-1 0x19 (if there is a dimm) channel 00 slot 02 i2c-1 0x1a (if there is a dimm) channel 00 slot 03 i2c-1 0x1b (if there is a dimm) channel 01 slot 00 i2c-1 0x1c (if there is a dimm) channel 01 slot 01 i2c-1 0x1d (if there is a dimm) channel 01 slot 02 i2c-1 0x1e (if there is a dimm) channel 01 slot 03 i2c-1 0x1f (if there is a dimm) channel 02 slot 00 i2c-2 0x18 (if there is a dimm) channel 02 slot 01 i2c-2 0x19 (if there is a dimm) channel 02 slot 02 i2c-2 0x1a (if there is a dimm) channel 02 slot 03 i2c-2 0x1b (if there is a dimm) channel 03 slot 00 i2c-2 0x1c (if there is a dimm) channel 03 slot 01 i2c-2 0x1d (if there is a dimm) channel 03 slot 02 i2c-2 0x1e (if there is a dimm) channel 03 slot 03 i2c-2 0x1f (if there is a dimm) Experimentally, I gain the impression it's rather channel 00 slot 00 i2c-1 0x18 (if there is a dimm) channel 00 slot 01 i2c-1 0x19 (if there is a dimm) channel 01 slot 00 i2c-1 0x1a (if there is a dimm) channel 01 slot 01 i2c-1 0x1b (if there is a dimm) channel 02 slot 00 i2c-2 0x18 (if there is a dimm) channel 02 slot 01 i2c-2 0x19 (if there is a dimm) channel 03 slot 00 i2c-2 0x1a (if there is a dimm) channel 03 slot 01 i2c-2 0x1b (if there is a dimm) Why? Because we see on our system temperature sensors on i2c address i2c-1 0x18 and ic2-1 0x1a and BIOS and EDAC tell us we have DIMMs on channel 0:slot 0 and channel 1:slot 0. [ 9.522781] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400 [ 9.522786] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400 When in OLTT mode, the sensors need to be manually instantiated, e.g. # echo jc42 0x18 > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device # echo jc42 0x1a > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device In CLTT mode - we expect almost everyone to configure CLTT mode in their BIOS - the new driver knows where DIMMs are populated (see arguments to imc_instantiate_sensor()) and instantiates the sensors. For this magic to happen, we don't need to understand the mapping. Unit Test I had access to two systems with these memory configurations: System 1: DIMM at channel 1, slot 0. System 2: DIMM at channel 0, slot 0. DIMM at channel 1, slot 0. I had no access to a system with DIMMs on channel 2 or 3. We read the temperature sensors for 8 hours while having CLTT enabled. Next we read the temperature sensors for 8 hours while having OLTT enabled. We always get sane data. The internal sanity check always passes and dmesg is clean. The grep at the end filters out sane temperature values in the 20C to 39C range so we can focus on abnormal temperature values and error messages. First we stress-tested the driver (for 8 hours). System 1: while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & System 2: while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] & Next, we gave firmware polling a better chance to start and added a sleep of 2 seconds (for 8 hours). System 1 and System 2: while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; sleep 2; done | grep -v ^[23] & ~ Stefan Stefan Schaeckeler (1): i2c: imc: Add support for Intel iMC SMBus host controller. MAINTAINERS | 5 + drivers/i2c/busses/Kconfig | 15 ++ drivers/i2c/busses/Makefile | 1 + drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 536 insertions(+) create mode 100644 drivers/i2c/busses/i2c-imc.c -- 2.11.0