Re: [PATCH v1 2/4] hwmon: (sfctemp) Add StarFive JH71x0 temperature sensor

Guenter Roeck <linux@xxxxxxxxxxxx> · Wed, 8 Feb 2023 07:16:52 -0800

On 2/8/23 04:40, Hal Feng wrote:
On Mon, 6 Feb 2023 11:21:38 -0800, Guenter Roeck wrote:
On 2/6/23 09:12, Hal Feng wrote:
On Tue, 3 Jan 2023 14:10:17 -0800, Guenter Roeck wrote:
On Tue, Jan 03, 2023 at 09:31:43AM +0800, Hal Feng wrote:
[...]

diff --git a/drivers/hwmon/sfctemp.c b/drivers/hwmon/sfctemp.c
new file mode 100644
index 000000000000..e56716ad9587
--- /dev/null
+++ b/drivers/hwmon/sfctemp.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021 Emil Renner Berthing <kernel@xxxxxxxx>
+ * Copyright (C) 2021 Samin Guo <samin.guo@xxxxxxxxxxxxxxxx>
+ */
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/hwmon.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+
+/*
+ * TempSensor reset. The RSTN can be de-asserted once the analog core has
+ * powered up. Trst(min 100ns)
+ * 0:reset  1:de-assert
+ */
+#define SFCTEMP_RSTN    BIT(0)

Missing include of linux/bits.h

Will add it. Thanks.


+
+/*
+ * TempSensor analog core power down. The analog core will be powered up
+ * Tpu(min 50us) after PD is de-asserted. RSTN should be held low until the
+ * analog core is powered up.
+ * 0:power up  1:power down
+ */
+#define SFCTEMP_PD    BIT(1)
+
+/*
+ * TempSensor start conversion enable.
+ * 0:disable  1:enable
+ */
+#define SFCTEMP_RUN    BIT(2)
+
+/*
+ * TempSensor conversion value output.
+ * Temp(C)=DOUT*Y/4094 - K
+ */
+#define SFCTEMP_DOUT_POS    16
+#define SFCTEMP_DOUT_MSK    GENMASK(27, 16)
+
+/* DOUT to Celcius conversion constants */
+#define SFCTEMP_Y1000    237500L
+#define SFCTEMP_Z    4094L
+#define SFCTEMP_K1000    81100L
+
+struct sfctemp {
+    /* serialize access to hardware register and enabled below */
+    struct mutex lock;
+    struct completion conversion_done;
+    void __iomem *regs;
+    struct clk *clk_sense;
+    struct clk *clk_bus;
+    struct reset_control *rst_sense;
+    struct reset_control *rst_bus;
+    bool enabled;
+};
+
+static irqreturn_t sfctemp_isr(int irq, void *data)
+{
+    struct sfctemp *sfctemp = data;
+
+    complete(&sfctemp->conversion_done);
+    return IRQ_HANDLED;
+}
+
+static void sfctemp_power_up(struct sfctemp *sfctemp)
+{
+    /* make sure we're powered down first */
+    writel(SFCTEMP_PD, sfctemp->regs);
+    udelay(1);
+
+    writel(0, sfctemp->regs);
+    /* wait t_pu(50us) + t_rst(100ns) */
+    usleep_range(60, 200);
+
+    /* de-assert reset */
+    writel(SFCTEMP_RSTN, sfctemp->regs);
+    udelay(1); /* wait t_su(500ps) */
+}
+
+static void sfctemp_power_down(struct sfctemp *sfctemp)
+{
+    writel(SFCTEMP_PD, sfctemp->regs);
+}
+
+static void sfctemp_run_single(struct sfctemp *sfctemp)
+{
+    writel(SFCTEMP_RSTN | SFCTEMP_RUN, sfctemp->regs);
+    udelay(1);
+    writel(SFCTEMP_RSTN, sfctemp->regs);

The datasheet (or, rather, programming manual) does not appear
to be public, so I have to guess here.

The code suggests that running a single conversion may be a choice,
not a requirement. If it is indeed a choice, the reasoning needs to be
explained since it adds a lot of complexity and dependencies to the
driver (for example, interrupt support is only mandatory or even needed
due to this choice). It also adds a significant delay to temperature
read operations, which may have practical impact on thermal control
software.

If the chip only supports single temperature readings, that needs to be
explained as well (and why SFCTEMP_RUN has to be reset in that case).

The chip supports continuous conversion. When you set SFCTEMP_RUN, the
temperature raw data will be generated all the time. However, it will
also generate interrupts all the time when the conversion is finished,
because of the hardware limitation. So in this driver, we just support
the single conversion.


Sorry, I don't follow the logic. The interrupt is, for all practical
purposes, useless because there are no limits and exceeding any such
limits is therefore not supported. The only reason to have and enable
to interrupt is because continuous mode is disabled.

The code could be simplified a lot if interrupt support would be
dropped and continuous mode would be enabled.

If we enable continuous mode, which means SFCTEMP_RUN remains asserted,
the conversion finished interrupt will be raised after each sample
time (8.192 ms). Within a few minutes, a lot of interrupts are raised,
as showed below.

# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   1:          0          0          0          0  SiFive PLIC   1 Edge      ccache_ecc
   2:          1          0          0          0  SiFive PLIC   3 Edge      ccache_ecc
   3:          1          0          0          0  SiFive PLIC   4 Edge      ccache_ecc
   4:          0          0          0          0  SiFive PLIC   2 Edge      ccache_ecc
   5:       1116       1670        411       1466  RISC-V INTC   5 Edge      riscv-timer
   6:      32093          0          0          0  SiFive PLIC  81 Edge      120e0000.temperature-sensor
  10:       1233          0          0          0  SiFive PLIC  32 Edge      ttyS0
IPI0:       117         62        123        117  Rescheduling interrupts
IPI1:       278        353        105        273  Function call interrupts
IPI2:         0          0          0          0  CPU stop interrupts
IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0  IRQ work interrupts
IPI5:         0          0          0          0  Timer broadcast interrupts

If we enable continuous mode and drop the interrupt support in the
driver, the kernel will not know the interrupts but a lot of interrupts
are still raised in hardware. Can we do such like that?

Why not ? It just stays raised. That happens a lot.

Without the interrupt support, the temperature we read may be the value
generated in the last cycle.

That would be highly unusual and should be documented.



I think the temperature has its value only when we read it, so we start

"may be" ? "I think" ? That means you don't know ? Maybe test it, or ask
the chip designers.

conversion only when we read the temperature. Further more, it will
consume more power if we enable continuous mode.


Usually that is not a concern, much less so than delaying each reader.

Ultimately, sure, you can do whatever you want. I'll still accept the driver.
I do expect you to explain your reasons (all of them) in the driver, though.

If you don't _know_ if the temperature is updated in continuous mode,
please state exactly that in the comments. Also explain how much power
is saved by not running in continuous mode. I don't want anyone to come
back later on and change the code because they don't know the reasons
why it doesn't use continuous mode.

Thanks,
Guenter

Best regards,
Hal