Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

Maximilian Luz <luzmaximilian@xxxxxxxxx> · Thu, 2 May 2024 22:05:44 +0200

Hi,

On 4/16/24 11:08 PM, Guenter Roeck wrote:
On Tue, Apr 16, 2024 at 09:00:05PM +0200, Maximilian Luz wrote:
On 4/16/24 3:30 PM, Guenter Roeck wrote:
On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:

[...]

+static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
+{
+	struct ssam_tmp_get_name_rsp name_rsp;
+	int status;
+
+	status =  __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
+	if (status)
+		return status;
+
+	/*
+	 * This should not fail unless the name in the returned struct is not
+	 * null-terminated or someone changed something in the struct
+	 * definitions above, since our buffer and struct have the same
+	 * capacity by design. So if this fails blow this up with a warning.
+	 * Since the more likely cause is that the returned string isn't
+	 * null-terminated, we might have received garbage (as opposed to just
+	 * an incomplete string), so also fail the function.
+	 */
+	status = strscpy(buf, name_rsp.name, buf_len);
+	WARN_ON(status < 0);

Not acceptable. From include/asm-generic/bug.h:

   * Do not use these macros when checking for invalid external inputs
   * (e.g. invalid system call arguments, or invalid data coming from
   * network/devices), and on transient conditions like ENOMEM or EAGAIN.
   * These macros should be used for recoverable kernel issues only.

Hmm, I always interpreted that as "do not use for checking user-defined
input", which this is not.

"invalid data coming from network/devices" is not user-defined input.

The reason I added/requested it here was to check for "bugs" in how we
think the interface behaves (and our definitions related to it) as the
interface was reverse-engineered. Generally, when this fails I expect
that we made some mistake in our code (or the things we assume about the
interface), which likely causes us to interpret the received data as
"garbage" (and not the EC sending corrupted data, which it is generally
not due to CRC checking and validation in the SAM driver). Hence, I
personally would prefer if this blows up in a big warning with a trace
attached to it, so that an end-user can easily report this to us and
that we can appropriately deal with it. As opposed to some one-line
error message that will likely get overlooked or not taken as seriously.

I have heard the "This backtrace is absolutely essential" argument before,
including the "will be fixed" part. Chromebooks report more than 500,000
warning backtraces _every single day_. None of them is getting fixed.

If you still insist, I could change that to a dev_err() message. Or
maybe make the comment a bit clearer.

dev_err() would be acceptable. WARN() or WARN_ON() are no-go.

Sorry for the delayed response. I will change this to a dev_err() then
and try to re-spin the patches this weekend.

Best regards,
Max